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Detection  of  a structural  change  in  a regression  model 
has  recently  attracted  considerable  interest  in  the 
literature  on  both  econometrics  and  statistics.  Perron's 
hypothesis,  that  standard  tests  of  the  unit  root  hypothesis 
against  trend  stationary  alternatives  cannot  reject  the  unit 
root  hypothesis  if  the  series  has  a structural  break  at  some 
intermediate  date,  has  initiated  a series  of  testing 
procedures  for  parameter  constancy  in  nonstationary  time 
series.  After  Christiano's  criticism  about  Perron's  a priori 
known  break  point,  the  empirical  results  from  several 
testing  methods  endogenizing  the  break  point  selection 
procedure  have  provided  evidence  against  Perron's 
hypothesis . 
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Bayesian  inferential  procedures  for  detecting  a 
structural  break  in  dynamic  models  are  more  manageable  than 
those  of  the  classical  approach.  Also,  the  Bayesian 
inferential  theory  is  largely  unaffected  by  the  presence  of 
unit  roots.  In  this  dissertation  I review  the  Bayesian 
inferential  procedure  for  detecting  a structural  change  in 
autoregressive  models  with  a Monte  Carlo  study  and  apply 
this  methodology  to  the  GNP  series  of  the  OECD  countries. 

Monte  Carlo  studies  about  detecting  a structural  change 
in  the  autoregressive  model  show  that  the  Bayesian  posterior 
mass  function  of  m detects  a break  point  more  readily  than 
the  classical  approach  even  when  the  series  is 
nonstationary.  When  a peak  of  the  marginal  posterior  mass 
function  of  m occurs  within  a sample  period,  it  indicates 
a break  point. 

Using  Bayesian  inference,  I found  strong  evidence 
supporting  Perron's  hypothesis  even  after  endogenizing  the 
break  point  selection  procedure.  The  results  for  Canada, 
France,  Italy,  and  Japan  show  that  standard  tests  of  the 
unit  root  hypothesis  were  biased  in  favor  of  accepting  the 
unit  root  hypothesis  because  of  a structural  break. 
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CHAPTER  1 


INTRODUCTION 

Detection  of  a structural  change  in  a regression  model 
has  recently  attracted  considerable  interest  in  the 
literature  on  both  econometrics  and  statistics.  The 
conventional  statistical  tests  for  parameter  constancy  in 
stationary  time  series  suffer  from  the  lack  of  an 
appropriate  asymptotic  theory  when  a structural  break  is 
assumed  to  be  unknown.  Recently  the  functional  central  limit 
theorem  and  the  continuous  mapping  theorem  have  been  used  to 
shed  light  on  the  asymptotic  theory  for  testing  parameter 
constancy. 

Perron's  (1989)  hypothesis,  that  standard  tests  of  the 
unit  root  hypothesis  against  trend  stationary  alternatives 
cannot  reject  the  unit  root  hypothesis  if  the  series  has  a 
structural  break  at  some  intermediate  date,  has  initiated  a 
series  of  testing  procedures  for  parameter  constancy  in 
nonstationary  time  series.  After  Christiano's  (1988) 
criticism  about  Perron's  a priori  known  break  point,  the 
empirical  results  from  several  testing  methods  endogenizing 
the  break  point  selection  procedure  have  provided  evidence 
against  Perron's  hypothesis. 
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Classical  testing  procedures  for  parameter  constancy 
require  the  sophisticated  concepts  of  Brownian  motion  to 
obtain  an  asymptotic  theory  of  testing  procedures, 
especially  for  the  nonstationary  case.  Tabulating  the 
critical  values  of  test  statistics  is  also  extremely 
difficult  since  the  test  statistics  are  usually  functions  of 
Wiener  processes. 

Bayesian  inferential  procedures  for  detecting  a 
structural  break  in  dynamic  models  are  more  manageable  than 
those  of  the  classical  approach.  Also,  the  Bayesian 
inferential  theory  is  largely  unaffected  by  the  presence  of 
unit  roots.  In  this  dissertation  I review  the  Bayesian 
inferential  procedure  for  detecting  a structural  change  in 
autoregressive  models  with  a Monte  Carlo  study  and  apply 
this  methodology  to  the  data  sets  analyzed  by  Banerjee, 
Dolado,  and  Galbraith  (1990) , and  Banerjee,  Lumsdaine,  and 
Stock  (1990). 

The  flat-prior  Bayesian  approach  has  been  criticized  by 
Phillips  (1990) . Phillips  suggested  an  alternative 
ignorance-prior  based  on  Jeffreys'  theory  of  invariance 
instead  of  the  flat-prior.  I discuss  this  criticism  and 
report  some  Monte  Carlo  results,  arguing  that  Phillips' 
ignorance-prior  is  not  better  than  the  flat-prior. 

Chapter  2 reviews  the  classical  testing  procedures  for 
parameter  constancy  in  stationary  and  nonstationary  cases. 
Chapter  3 derives  the  Bayesian  inferential  procedure  for 
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detecting  a structural  change  in  autoregressive  models  and 
reports  some  Monte  Carlo  evidence  for  identification  of  a 
break  point.  Chapter  4 reviews  the  previous  results  analyzed 
by  the  classical  approach.  Chapter  5 reports  the  empirical 
results  reanalyzed  by  the  Bayesian  methodology  derived  in 
Chapter  3 for  the  data  sets  in  Chapter  4 . Chapter  6 
discusses  the  criticism  of  the  flat-prior  Bayesian  inference 
and  the  classical  approach  in  the  presence  of  unit  roots. 
Chapter  7 concludes  the  dissertation. 


CHAPTER  2 


TESTS  FOR  PARAMETER  CONSTANCY 


This  chapter  describes  and  compares  different  types  of 
classical  techniques  for  testing  the  null  hypothesis  of 
constant  parameters  over  time  when  regression  analysis  is 
applied  to  time-series  data.  First  I will  discuss  the 
testing  procedures  for  parameter  constancy  in  the  regression 
model  with  stationary  regressors  in  Section  2.1.  I will 
briefly  sketch  how  to  get  an  asymptotic  theory  and  review 
conventional  tests  for  parameter  constancy  against  the 
different  characterizations  of  alternative  hypotheses  — a 
single  structural  change  or  general  unspecified 
alternatives.  In  Section  2.2  I will  sketch  two  different 
approaches  to  obtaining  a nuisance  parameter  free  limiting 
distribution  and  then  discuss  several  testing  procedures  for 
parameter  constancy  in  nonstationary  time  series. 


2.1.  Stationary  Regressors 

The  conventional  tests  for  parameter  constancy  can  be 
classified  into  two  different  approaches  according  to  the 
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specification  of  an  alternative  hypothesis.  One  approach  is 
to  test  the  null  hypothesis  of  constant  coefficients  against 
the  alternative  that  a single  structural  change  has  occurred 
at  some  unknown  time.  The  other  is  to  test  against  general 
unspecified  structural  changes  such  as  varying  parameters 
and  random  walk  coefficients. 

Before  reviewing  the  tests,  I will  sketch  briefly  how 
to  get  an  asymptotic  distribution  which  is  the  essential 
part  of  the  most  recent  papers  concerning  the  testing 
procedures  for  structural  changes. 


2.1.1.  Brief  Sketch  of  Asymptotic  Theory 

Recent  developments  in  functional  central  limit  theory 
(FCLT)  and  the  continuous  mapping  theorem  (CMT)  have 
provided  the  missing  tools  to  develop  an  asymptotic  theory. 
The  following  procedure  to  get  an  asymptotic  theory  can  be 
found  in  most  recent  studies  about  testing  structural 
changes . 

1)  First  express  the  sequence  of  a test  statistic  as  a 
function  of  r = m/ T where  m is  a split  point.  The 
sequence  is  viewed  as  a function  of  r for  values  in  K, 
some  compact  subset  of  (0,1). 

2)  The  partial  sums  of  variables  in  the  sequence  of  a 
test  statistic  can  be  substituted  asymptotically  by 
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Wiener  process,  W(r)  , based  on  the  functional  central 
limit  theorem.  The  sequence  of  a test  statistic  will  be 
the  function  of  Wiener  processes.  Note  that  applying 
FCLT  holds  only  for  change  points  evaluated  in  some 
closed  region  on  (0,1).  This  means  that  change  points 
too  close  to  the  sample  boundaries  cannot  be  tested1 . 

3)  The  test  statistic  will  be  some  function  of  the 
sequence,  for  example, 

/(•)  = max:  for  detecting  a single  change  point 
f(-)  = mean:  for  general  unspecified  alternatives. 
The  asymptotic  distribution  of  the  test  statistic  can 
be  obtained  by  the  continuous  mapping  theorem.  The 
asymptotic  distribution  is  usually  a function  of 
Brownian  bridges,  W°(r)  . 

4)  The  hitting  probability  of  Brownian  bridge  is 
obtained  by  well-known  distribution  function. 

Pr[Sup\\B(r)  L a X]  * 0,  x < 0 

Osrsl 

oo 

= [1  + 2^  (-1)  ^xp  (-2 i2x2)  ] *,  x 2.  0 

i=l 

This  arises  in  various  contexts.  For  instance,  it 
equals  the  null  distribution  of  the  Kolmogorov-Smirnof f 
test  for  k=l.  (Billingsley,  1968).  Alternatively  the 

1 Andrews  (1990)  points  out  that  the  sequence  of  the  test 
statistics  at  points  bounded  away  from  the  beginning  and  end 
of  the  sample  is  an  important  component  of  the  asymptotic 
theory.  Andrews  suggested  the  practical  rule  of  taking  K=[.15, 
.85]  which  can  be  compared  with  [.10,  .90]  by  Kim  and  Sieqmund 

(1989)  . 
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limiting  distribution  can  be  approximated  by  Laplace 
transformation  or  some  expansion  methods. 


2.1.2.  Single  structural  change 

The  traditional  Chow  (1960)  test  is  developed  to  test 
the  null  hypothesis  of  parameter  constancy  against  the 
alternative  of  a known  break  point  a priori  under  the 
assumption  of  constant  variances.  The  papers  by  Quandt 
(1958,  1960)  discuss  testing  the  null  hypothesis  of  constant 
coefficients  against  a more  general  alternative,  where  a 
structural  change  has  occurred  at  some  unknown  time  and  the 
error  variance  is  also  allowed  to  change.  However,  Quandt's 
likelihood  ratio  testing  procedure  suffers  from  the  lack  of 
a distribution  theory.  Recently  the  functional  central  limit 
theorem  and  the  continuous  mapping  theorem  have  been  used  to 
shed  light  on  the  asymptotic  distribution  theory  of  that. 
Here  I will  review  Quandt's  LR  test  and  then  the  Kim  and 
Siegmund  (1989)  and  Chu  (1989)  testing  procedures. 


2. 1.2.1.  Quandt's  LR  Test 


In  this  case  the  observations  are  thought,  for 
theoretical  reasons,  to  have  been  generated  by  two  distinct 


regression  regimes.  Thus,  for  some  subset  I of  the  n 
observations , 
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Yi  = *iPi  + uxi  (iej) 
and  for  the  complementary  subset  J 

Yj  = Xflj  + U2J.  (J6J) 

The  essence  of  this  simple  formulation  is  that  all  the 
observations  up  to  the  unknown  time  m come  from  one  regime 
and  all  the  observations  after  that  point  come  from  the 
other.  On  the  assumption  that  m is  the  time  at  which  the 
switch  from  one  regime  to  the  other  occurred,  the  likelihood 
of  the  sample  can  be  written  as 


L = ( 


)M 


yj2rio. 


) T~t  exp  [— ±-  <yt  - x$ x)'(yt  - xjj 


2a\ 


— \ (Yt- t ~ XT-J2)'(Yt.c  - Xr_£P2)  ] 
2 o2 

The  value  of  L may  be  evaluated  for  all  possible 
choices  of  t and  that  value  chosen  as  the  estimate  of  the 
unknown  switching  point  which  maximizes  L.  If  we  wish  to 
test  the  hypothesis  that  there  is  no  switch  in  regimes 
against  the  alternative  of  one  switch,  the  appropriate 
likelihood  ratio  is 


X = ±mlogol  + logo^  - — Tlogo2 

The  estimate  of  the  point  at  which  the  switch  from  one 
relationship  to  another  has  occurred  is  then  the  value  of  m 
at  which  A attains  its  minimum.  However,  implementation  of 
this  procedures  has  been  hindered  by  the  lack  of  a 
distribution  theory.  Quandt  noted  on  the  basis  of  a Monte 
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Carlo  experiment  that  a proposed  chi-squared  approximation 
to  the  significant  level  of  the  likelihood  ratio  test  is 
very  poor.  It  was  shown  empirically  in  Quandt  (1960)  that 
the  x2  distribution  is  a poor  approximation  to  that  of 
-2logA. . 


2. 1.2. 2.  Kim  and  Siegmund  Test 

Kim  and  Siegmund  (1989)  examined  likelihood  ratio  tests 
to  detect  a structural  change  in  simple  linear  regression 
when  the  alternative,  Hx,  specifies  that  only  the  intercept 
changes  and  when  the  alternative,  H2 , permits  the  intercept 
and  the  slope  to  change. 

They  show  that  the  generalized  likelihood  ratio  test  of  H0 
against  H1  rejected  H0  for  large  values  of 

| UTU)  | 


where 


max 

T0S.ti.Tx 


UtU)  = ( i -£f/r)  2 [Vt  ~ Vt  ” " *t)]  ‘ K{x) 


with 


= (a,  - &2)  ( — 


;) 


K(x) 


K(x)  = 


t/T 

1 - t(xt  - xT) 2 


d - - *C) 

1 k=l 

If  it  were  known  that  the  only  possible  value  of  the  change 
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point  m is  m=t,  the  appropriate  two-sample  statistic  for 
testing  Ha  would  be  \UT{C)\/d.  The  maximization  of  \uT(t)\/d 
over  t searches  for  the  unknown  value  of  m. 

They  derived  the  asymptotic  distribution  of  the  test 
statistic  by  the  following  procedure  which  is  the  same  as 
that  in  section  2.1.1. 

1)  The  sequence  of  test  statistic  can  be  written  as 
\uT(Tr)  | / a . 

2)  The  sequence  converges  weakly  on  [r]  to  [r  (1  -r)  ] ~1/2W°(r) 
by  the  functional  central  limit  theorem  where  W°(r)  is  the 
Brownian  bridge. 


3)  Kim  and  Siegmund  choose  max  function  as  a mapping  from 
D[0,1]  to  R.  The  asymptotic  distribution  is 


p(  max  — -it) 


T0itiTx 

where 


) 2b{  l~^r)  ( 2 ] /p(t)v[(l^2ii(t)  ) 2]dt 
71  1 i l-c2 


v (x)  = 2x~2exp{-2]T  ir1®  (--^-x/n)}  (x>0)  , 

73=1  2 


H (x) 


1 

2 t (1- 1)  [1  - gr2(t)  t(l-fc)]  ' 

i l 


Jf{u)du  - du 

g(  t)  = 2 o 

^ 1 i 

(1-t)  [Jf2(u)du-  (ff(u)du)2]  T 
0 0 

and  <3>  denotes  the  standard  normal  distribution  function. 

4)  Instead  of  using  the  hitting  probability  of  Brownian 
bridge,  Kim  and  Siegmund  approximated  the  special  function 


v (x)  as 
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v (x)  - exp  (-cx) 

where  c is  a numerical  constant  approximately  equal  to 
0.583. 

Similarly  the  likelihood  ratio  for  the  test  of  Ha 
against  H2  is 

ulTU)  + ulT(t) 


max 

toStSti 


A O 

O'* 


where 


Uj  y\  £ ) ” If)  ~ 


with 


*it  = 


= (i,  -1,0) , a£  = (o. 

1,0, -1) 

1 0 xx 

1 X1  0 0 

• 

. 

. 

■ 

1 0 Xt 

1 xt  0 0 

0 1 

' X2t 

0 0 1 xt+1 

. . 

• • ■ • 

■ • ■ 

. . . 

0 1 Xr 

0 0 1 xm 

1)  The  sequence  of  test  statistic  is  o'1  [ U1T(  Tr)  , U2T(  Tr)  ] 

2)  The  sequence  converges  weakly  on  [r1,r2]  to 

[r(l-r)]'1/2  [Wi°(r)  ,W2°(r)] 
by  the  functional  central  limit  theorem. 

3)  The  asymptotic  distribution  of  the  maximum  of  the 
sequence  is  by  the  continuous  mapping  theorem 

T-6  tl  2,1 


(2n)  "1jb2 3  (1  - -^7 ) 2 f f H ( t,  6)  v [ ( 2c2^/  6)- ) 2 1 dQdt 

t0  o 1 C 

4)  For  calculating  the  critical  values  of  the  limiting 
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distribution,  v (x)  is  approximated  as  mentioned  above. 

Table  1 shows  that  the  critical  values  for  both  test 
statistics  by  a 10,000  repetition  Monte  Carlo  experiment. 


Table  1. 

Critical  Values  of  Kim-Siegmund  test  statistic 


T 

Prob. 

vs.  H1 

vs.  H2 

20 

.10 

2.66 

2.96 

.05 

2.84 

3.14 

.01 

3.19 

3.44 

40 

.10 

2.76 

3.12 

.05 

2.98 

3.33 

.01 

3.43 

3.73 

2 . 1 . 2 . 3 . Chu  Test 


Chu  (1989)  derived  the  asymptotic  distribution  of 
Quandt's  LR  statistic  in  a multiple  regression  model  using 
the  same  procedure  described  in  section  2.1.1. 

1)  The  square  root  of  the  sequence  of  the  LR  statistic  is 

yfT 

where 

Dt=  ( ^ XTtXc  ) -1 

and  VT  is  a consistent  estimator  of  V=lim  EiS^S?]  / T and 

st='J2  Ut. 
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2)  By  FCLT  the  square  root  of  the  sequence  converges  weakly 
to  W°{i)  that  is  a p-dimensional  Brownian  bridge. 

3)  By  the  continuous  mapping  theorem  the  maximum  of  the 
square  root  of  the  sequence  converges  to  Supr\\W  ° (r)  \\ , so  the 
maximum  LR  test  statistic  converges 


Max  2 1nA <*)  ? Sup  'W°U)  1 

ps.ki.t-p 


re(0,l)  y/r  (1-r) 

4)  The  critical  values  of  the  limiting  distribution  can  be 

obtained  by  the  hitting  probability  of  Brownian  bridge. 

Chu  proposed  two  test  statistics  for  testing  constancy 
of  a trend  coefficient 

7,  ■ max  <-££>  (^)3<|5„  - M 


and  co-integration  factor 


T.  = max  _A_r3/2(  — )3( am  - &T) 

4 msT-l  3 ox  T T‘ 

where  h is  a drift  parameter.  Applying  the  same  procedure 

described  in  section  2.1.1  he  derived  the  limiting 

distributions  under  the  null  hypothesis  of  constant 

parameters  as  follows: 

lim  Pr[|r3|  > c]  = lim  Pr[|T4|  > c] 

= Pr  [ Sup  I W°(t)  I > y/3  c 
re[o,l] 

Also  the  critical  values  can  be  obtained  by  procedure  4)  in 
section  2.1.1. 
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2.1.3.  General  Alternative  Hypothesis 

Until  now  I considered  the  alternative  hypothesis  that 
a single  structural  change  occurred  at  some  unknown  point. 
Here  I will  discuss  the  testing  procedure  of  parameter 
constancy  against  general  unspecified  alternatives.  An 
influential  paper  in  this  approach  is  Brown,  Durbin  and 
Evans  (1975),  which  proposed  the  CUSUM  and  CUSUM  of  squares 
tests  based  on  the  recursive  residuals.  However,  the  CUSUM 
test  turned  out  to  have  no  asymptotic  local  power  against 
movements  in  coefficients  of  zero-mean  regressors.  For  the 
power  problem  of  the  CUSUM  test  Ploberger,  Kramer  and 
Kontrus  (1989)  suggested  the  Fluctuation  test  and  Nyblom 
(1989)  proposed  the  Locally  Most  Powerful  test. 


2. 1.3.1.  CUSUM  Test 


The  CUSUM  test  involves  considering  the  plot  of  the 
quantity , 

i m 

Wm  = -T  £ Wt'  m = k+1 'T 

° t=Jc+ 1 

where  wt  is  the  recursive  residual.  Under  H0,  probabilistic 
bounds  for  the  path  of  Wm  can  be  determined  and  H0  is 
rejected  if  Wm  crosses  the  boundary  (associated  with  the 
level  of  the  test)  for  some  m.  This  test  is  aimed  mainly  at 
detecting  systematic  movements  of  coefficients.  Against 


15 


haphazard  rather  than  systematic  types  of  movements,  Brown 
et  al.  proposed  the  CUSUM  of  Squares  test,  which  uses  the 
squared  recursive  residuals  and  is  based  on  a plot  of  the 
quantities , 


The  H0  is  rejected  if  the  path  Sm  crossed  a boundary 
determined  by  the  level  of  the  test.  These  tests  are  of  the 
goodness-of-fit  type  in  the  sense  that  they  seem  applicable 
against  a wide  variety  of  alternatives. 

Ploberger  and  Kramer  (1986)  criticized  the  idea  that 
the  power  of  the  CUSUM  test  is  obtained  solely  through 
leverage  on  the  mean  of  the  dependent  variable.  Thus  the 
CUSUM  test  has  no  asymptotic  local  power  against  movements 
in  coefficients  of  zero-mean  regressors.  The  CUSUM  of 
Squares  test  fares  even  worse  with  local  asymptotic  power 
equal  to  size. 

2. 1.3. 2.  Fluctuation  Test 

For  the  power  problem  of  the  CUSUM  test,  Ploberger, 
Kramer  and  Kontrus  (1989)  proposed  the  Fluctuation  test 
based  on  successive  parameter  estimates  rather  than  on 
recursive  estimates.  A similar  procedure  for  single 
regression  model  has  been  suggested  by  Sen  (1980)  and  the 


m 


where  S2  = w\,  m = k+1,  . . . , T 


t=Jc+l 


T 
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Fluctuation  test  was  first  suggested  by  Ploberger  (1983) . 

Ploberger,  Kramer  and  Kontrus  considered  the  varying 
parameter  model  and  proposed  the  fluctuation  test  which  is 
based  on  rejecting  the  null  hypothesis  of  parameter 
constancy  whenever  these  estimates  fluctuate  too  much.  Their 
test  statistic  is 


where 


S(r>  = max  JL  \\(Xm'xiT))V2  (QU) 

C=Jc,  . . . , T oT 


d = [£  (yt  - x't$iT))2/(T  - K)  ] 1/2 

and  | jL  denotes  the  maximum  norm.  They  derived  the  limiting 
distribution  of  the  test  statistics  using  the  same  procedure 
described  in  section  2.1.1. 

1)  The  sequence  of  the  test  statistic  is  written  as  function 
of  r 


Bm  (r)  = (Xir),X(T))  (1/2)  ($T(r)  - p(T)) 

2)  By  FCLT  the  sequence,  B(T)  (r)  converges  weakly  to  W°(r) 
that  is  a Brownian  bridge. 

3)  By  the  continuous  mapping  theorem  the  test  statistic  is 

S(T)  = supWB^L 

Oilil 

4)  The  critical  values  of  the  limiting  distribution  of  the 
test  statistic  can  be  calculated  using  well  known  boundary 
crossing  probabilities  of  the  Brownian  bridge  in  section 


2.1.1. 
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Table  2 

Critical  values  of  S{T) 


k 

a=0. 10 

0.05 

0.01 

1 

1.22 

1.36 

1.63 

2 

1.35 

1.48 

1.73 

3 

1.42 

1.54 

1.79 

4 

1.47 

1.59 

1.83 

5 

1.51 

1.62 

1.86 

They  also  show  that  the  fluctuation  test  has  non- 
trivial local  power  irrespective  of  the  particular  type  of 
structural  change.  However,  Kontrus  and  Ploberger's  (1984) 
Monte  Carlo  results  show  that  neither  the  CUSUM  test  nor  the 
fluctuation  test  dominates  the  other  in  small  samples. 


2. 1.3. 3.  Locally  Most  Powerful  Test 

Nyblom  (1989)  developed  the  locally  most  powerful  test 
against  a parameter  variation  in  the  form  of  a martingale. 
The  martingale  specification  has  an  advantage  of  covering 
several  types  of  departure  of  constancy:  for  example,  a 
single  jump  at  an  unknown  time  point  (change  point  model)  or 
slow  random  variation  (typically  random  walk) . Under  the 
assumption  of  no  parametric  changes  the  joint  density  f may 
be  written  as 
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f(x1 xT;  0O)  = fx  (xx ; 0O)  n fk  (xjx, **-i ; 0O) 

k=2 

Under  the  alternative  it  is  assumed  that  the  parameter 
changes  obey  a martingale  and  the  increment  0^-  0^  has  the 
covariance  matrix 

e*-i>  <0*-  0*-i>'l  - 

with  Gk  a known  matrix.  Then  the  density  for  the 
observations  is 


f{xlt  . . . ,xT;Qot  t2)  = f'"(f1(x1;t1) 

T 

x ||  f k (xk |x1 , , xk_i  i tk)  dH (tj,  i . 1 1 tj 

k- 2 

where  H is  the  joint  distribution  function  of  01,...,0T.  For 
small  x2  an  approximation  to  f*  is  obtained  by  finding  the 
Taylor  expansion  of  the  integrand  at  0O . By  imposing 
sufficient  regularity  conditions  it  is  obtained  that 

T T min(j,ic) 

. rV  V dL 


j-*  , a 2\  •>  T T min ( j , k) 

i (x, xT;  0O,  x ) y G)d 

f{xx,  . . • » xn;  0O)  2 J 


7=1  JC=1 


i=l 


+ tr  (Dk^  Gj)  ] + o(x2) 

i 1-1 

where  dk  = dlogfk/dd  and  Dk  = &loqfk/  0030')  . The  expression 
in  the  brackets  serves  as  the  locally  most  powerful  test 
statistic  for  the  null  hypothesis  #o:x2=0  against  the 
alternatives  Ha:x2>0.  Nyblom  suggested  the  test  statistic 
as  follows: 

T T min  (j,k)  T T T 

£ = EEdJ<  E Gi>d*  - E <Ed*)GJ(Ed*>  > c 

J=l  JC=1  2=1  j=l  k-j  k=j 

instead  of  the  full  locally  most  powerful  test  which 


includes  the  second  term  in  the  brackets.  The  proposed  test 
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statistic  L based  on  cumulative  sums  of  the  score  function 
is  often  asymptotically  equivalent  to  the  locally  most 
powerful  test  statistics.  If  the  observations  are 
independent  under  H0,  then  it  is  sufficient  that  the 
elements  of  Gk  and  the  variances  of  the  elements  of  Dk  are 
bounded  (independently  of  k)  . Thus  var  [,n'2]jP  Cr  {DkY^  Gd)  ] 

k i 

tends  to  0 and  n~2L  has  a nondegenerate  limiting 
distribution. 

The  certain  assumptions,  especially 

(nr) 

T"1  Ek_x  (dkdk)  «*  rJ,  was  required  to  get  the  limiting 

Jc-l 

distribution  of  the  test  statistic  using  the  same  procedure 
in  Section  2.1.1. 

1)  The  sequence,  L,  can  be  a function  of  r,  that  is,  the 
partial  sum  of  the  score  function  is  a function  of  r. 

2)  By  FCLT  the  partial  sums  of  the  score  function  converges 

Tz 

weakly  to  a p-variate  Wiener  process  dk  =*  W{r)  where 

Jc=l 

W{i)  has  E[W{i)]=  0 and  Cov[W(s)  W(r)  ] = min  (s,  r)  J. 

3)  By  the  continuous  mapping  theorem 

i 

n'2L  - | [nr(i)  - w(r)]'G(r)  [W{1)  - w(r)]dr 
0 

4)  Nyblom  suggested  the  Laplace  transform  of  the  limiting 
distribution  function  to  calculate  the  critical  values. 

If  the  parameter  differences  are  assumed  to  be 
identically  distributed  and  entail  uniform  jump 
probabilities  in  the  change  point  model,  then  an  appropriate 
choice  is  G{t)  -1/J.  And  since  the  starting  point  0O  is 
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rarely  known  it  must  be  replaced  by  an  estimator  . The 
same  procedure  is  applied  to  get  the  limiting  distribution 
of  n~2L. 

1)  same  as  above. 

2)  By  FCLT  the  partial  sums  of  the  score  function,  now, 
converges  weakly  to  W°(z)  that  is  a p-dimensional  Brownian 
bridge. 

3)  By  the  continuous  mapping  theorem 

i 

n~2L  =*  J'w°(r),J-1W°(r)  dr  = 
o 

4)  Nyblom  calculated  the  critical  values  of  the  limiting 
distribution  by  Laplace  transform  of  the  density. 


oo 

£ (uk)-2x£(p) 

Jc=l 


Table  3 

Asymptotic  upper-tail  percentage  points  of  LMPT 


a 

df=l 

2 

3 

4 

6 

0.10 

.347 

.607 

.841 

1.063 

1.487 

0.05 

.461 

.748 

1.000 

1.237 

1.686 

0.01 

.743 

1.074 

1.359 

1.623 

2.117 

Nyblom  noted  that  for  large  samples  a suitable  choice  is 
Jn(&n)  defined  by  Jn(0o)  = Ek_1[dk(Q0)  dk(Qa) ']  . For  the 

standard  linear  regression  model  the  test  statistic  for  the 
constancy  of  the  regression  is 

L = trlS^Y,  <£  ekxk)  (J2  ekx'^  Ud2  > c 

j=l  k=j  k=j 

where  ek  = yk  - Q'xk,d2  = (T  - p)  ek  , and  ST  = T 
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2 . 1 . 3 . 4 . Mean  Chov  Test 


Hansen  (1990)  proposed  the  MeanChow  test  which  is  the 
average  of  the  Chow  sequence.  As  mentioned  above,  test 
statistics  of  Quandt,  Kim  and  Siegmund  and  Chu  are  the 
MaxChow  type  test  - the  maximum  of  Chow  sequence  that  was 
designed  to  detect  a single  abrupt  structural  change  over 
sample  periods.  Hansen's  MeanChow  statistic  is  testing  for 
the  null  hypothesis  of  parameter  constancy  against  more 
general  alternatives  such  as  the  Fluctuation  test  and  the 
Locally  Most  Powerful  test.  He  suggested  two  statistics,  one 
for  stationary  regressors  and  the  other  for  non-stationary 
regressors.  Here  I will  discuss  stationary  case  only; 
nonstationary  case  will  be  discussed  in  section  2.2.  The 
standard  Chow  test  statistic  for  coefficient  constancy  with 
r known  is  given  by 

CU)  = b'rv?bz 

where  br  is  the  OLS  estimator  of  P and  Vb  is  a consistent 
estimate  of  the  variance-covariance  matrix  of  bz. 

To  get  the  limiting  distribution,  the  same  procedure  in 
section  2.1.1  is  applied. 

1)  The  Chow  process  is  given  as  the  function  of  r. 

2)  By  FCLT  and  CMT  the  Chow  process  converges  weakly  to  the 
function  of  the  Brownian  bridge. 

Recall  that  Chu  (1989)  derived  the  square  root  of  the 


C(r) 


W° (i)  'w° ( r ) 
r (1-r) 
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sequence  first. 

3)  Hansen  constructed  the  MeanChow  test  statistic  based  on 
the  range  K = [0.15,0.85]  which  was  suggested  by  Andrews 
(1990) . This  fact  comes  from  the  endpoint  problem  that  we 
need  to  evaluate  the  Chow  process  at  points  bounded  away 
from  the  beginning  and  end  of  the  sample.  This  is  an 
important  component  of  the  asymptotic  theory. 

MeanChow  = =■ £ C(j/T)  , Tx  = .15 T,  Tz  = .85 T 

The  limit  distribution  of  MeanChow  is  by  the  continuous 
mapping  theorem 

MeanChow  =►  fc(r)dr/fdr 

K K 

4)  Hansen  tabulated  the  asymptotic  critical  values  of  the 
test  statistics  by  Monte  Carlo  methods. 


Table  4 

Upper  percentage  points  of  the  MeanChow  distribution 
Stationary  regressors:  K = [.15,  .85] 


k 

R 

II 

• 

O 

in 

o 

• 

. 01 

l 

2 . 15 

2 .86 

4 . 61 

2 

3.73 

4.63 

6.52 

3 

5.18 

6.19 

8.56 

4 

6.46 

7.60 

10.16 

5 

7.75 

8.93 

11.61 

2.2  Nonstationarv  Regressors 


Concerning  the  testing  procedure  of  parameter  constancy 
in  the  model  with  stationary  regressors,  after  Chow's  (1960) 
test  with  a known  break  point,  many  econometricians 
mentioned  in  section  2.1  have  developed  the  tests  of 
parameter  constancy  with  an  estimated  break  point.  About 
the  model  with  nonstationary  regressors  Perron  (1989) 
initiated  the  studies  concerning  a unit  root  testing 
procedure  that  allows  for  a structural  change  in  the  trend 
function  under  the  alternative  hypothesis.  Perron's  testing 
procedure  is  based  on  a known  break  point  a priori. 
Christiano's  (1988)  criticism  was  that  the  date  of  the  break 
ought  not  be  treated  as  known  but  rather  should  be  treated 
as  unknown.  Following  Christiano's  criticism,  Banerjee, 
Lumsdaine  and  Stock  (1990),  Zivot  and  Andrew  (1990)  and 
Hansen  (1990)  developed  the  testing  procedures  with  an 
estimated  break  point.  Actually  these  tests  are  a kind  of 
joint  or  mixture  testing  procedure  of  unit  root  against 
stationarity  and  parameter  constancy. 

The  main  problem  of  making  inference  about  the  model 
with  nonstationary  regressors  is  that  the  limiting 
distribution  of  the  test  statistic  depends  upon  nuisance 
parameters.  Section  2.2.1  discusses  the  methods  that 
eliminate  the  nuisance  parameter  dependency  from  the 


limiting  distribution  of  a test  statistic.  Section  2.2.2 
reviews  several  testing  procedures. 


24 


2.2.1.  Nuisance  Parameter  Free  Limiting  Distribution 

The  same  procedure  described  in  section  2.1.1  can  be 
applied  to  get  the  asymptotic  distribution  of  the  test 
statistic  for  parameter  constancy  in  the  model  with 
nonstationary  regressors.  However,  if  we  allow  the 
disturbances  to  be  correlated  with  the  errors  of  the 
nonstationary  regressors  and  the  errors  are  heterogeneously 
distributed,  then  the  asymptotic  distributions  become 
nonstandard  in  that  they  depend  on  the  nuisance  parameters. 

Consider  the  regression 

yt  = + 0t  + Axt  + ut,  where  xt  = xc_x  + vt 

The  basic  idea  is  that  if  the  nonstationary  regressor  {xt} 
is  strictly  exogenous;  that  is,  when  {xt}  is  driven  by  a 
process  {vt}  which  is  generated  independently  from  the 
regression  error  process  { ut } , then  the  nuisance  parameter 
vanishes  in  the  limiting  distribution  of  a test  statistic. 

Let  us  discuss  a little  more  about  the  functional 
central  limit  theorem  in  section  2.1.1.  If  the  functional  is 

i i 

f*  {B,  M)  = JdBM'ijMM')-1 
o o 

then  = N(0,Q®Ip)  where  Q is  the  covariance  matrix 
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of  a Brownian  motion  B.  This  implies  that  upon  appropriate 
standardization  the  conventional  asymptotic  theory  applies 
to  regression  with  integrated  regressors  if  those  are 
strictly  exogenous.  If  not,  however,  the  coefficients  in  the 
model  with  nonstationary  regressors  converge  weakly  to  a 
different  functional. 

Define  a functional 

i i 

f(B,M,  E)  = ( JdBM'  + E ')  ( fMM')  -1 
o o 

where  Bx  is  a Brownian  motion  concerning  nonstationary 
regressors  and  B2  is  for  regression  error  process.  The 
matrix  M is  a stochastic  process  of  continuous  sample  paths 
and  can  be  a function  of  B2  where  E is  a matrix  (which  may 
be  random)  of  conformable  dimension.  Park  and  Phillips 
(1988)  show  that  the  coefficients  of  the  model  with 
integrated  regressors  converges  weakly  to  this  functional 
with  different  scales.  The  coefficient  of  nonstochastic 
regressor.  A,  converges  weakly  to  this  functional  with  T 
scale,  T(A  - A)  =*  f (•)  and  the  constant,  p,  converges  with 
JT  scale,  - p)  =*  f(')  . The  coefficient  of  trend,  0, 

converges  weakly  with  T3/2  scale,  7’3/2(0  - 0)  =»  /(•)  . The 
source  of  the  nuisance  parameter  dependency  is  E. 

Two  approaches  are  suggested  to  eliminate  the  nuisance 
parameter  dependency  from  the  limiting  distribution  of  a 


test  statistic. 
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2. 2. 1.1.  Augmented  Dickev-Fuller  (ADF)  Approach 

It  is  based  on  the  addition  of  extra  lags  of  first 
differences  of  the  data  as  regressors.  The  use  of  the  ADF 
statistics  is  subject  to  one  qualification:  the  number  of 
lags  of  the  series  required  for  augmented  Dickey-Fuller 
regression  to  have  the  correct  sample  size  and  good  power 
properties  tends  to  be  rather  large.  With  the  ADF  procedure, 
the  errors  are  restricted  to  the  class  of  ARMA(p,q) 
processes.  When  the  error  sequence  satisfies  a certain 
assumption,  based  on  arguments  outlined  in  Said  and  Dickey 
(1984) , the  limiting  distributions  of  the  test  statistics 
computed  from  the  ADF  regression  are  free  of  nuisance 
parameter  dependencies.  Banerjee,  Dolado,  and  Galbraith 
(1990) , Zivot  and  Andrews  (1990)  and  Banerjee,  Lumsdaine  and 
Stock  (1990)  follow  this  approach  without  a proof  of  the 
efficacy  of  the  ADF  approach. 


2.2. 1.2.  Nonparametric  Transformation 


The  nonparametric  variants  are  developed  by  Phillips 

T 

(1987).  Let  us  define  Q = lim  T^EiSjS'r)  , S = lim  T~1'V\  E(wcw/t) 
t t-i  T~°°  T~~  1 t 

and  A = lim  E^wjwt)  where  wt  = (u£,v£)  and  Slt  = Y]  W, . 

T~“  t-2  jZ 1 i 

Park  and  Phillips  (1988)  show  that  the  source  of  nuisance 
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parameter  dependency,  E,  is  a function  of  A21  = S21  + A21 
where  the  subscripts  mean  the  partition  of  the  appropriate 
matrix.  They  proved  that  if  Q21  = A21  = 0 (strict  exogeneity 
of  non-stat ionary  regressors) , then  the  conventional 
asymptotic  theory  can  be  applied,  especially  the  Wald-type 
test  statistic  is  distributed  as  a chi-square.  If  not,  they 
proposed  a nonparametric  transformation  of  the  test 
statistic  to  eliminate  the  parameter  dependency. 

T 

H(A)  = G(A)  - 2T  t r - A)A21  + T2  1 1 (£  xtX,t)  _1A21 

1 

The  limiting  distributions  of  the  H statistics,  by 
construction,  do  not  depend  on  the  nuisance  parameters,  A21. 

Hansen  (1990)  extended  this  proposition  to  the  test  for 
a structural  break  with  nonstationary  regressors.  For 
applying  the  conventional  asymptotic  theory  two  sources  of 
non-normality,  A21  and  Q21 , should  be  eliminated  from  the 
functional.  He  assumed  A21  = 0 (weak  exogeneity  of  non- 
stationary regressors)  based  on  the  fact  that  it  will  vanish 
if  a sufficient  number  of  lagged  values  of  kXt_k  are 
included.  Then  he  proposed  two  stage  methods  for  the 
nuisance-parameter-free-limiting  distribution  of  the  Chow 
statistics.  First  he  modified  the  dependent  variable,  y*t , 
using  the  OLS  estimates  of  original  regression  and  then 
obtained  the  fully  modified  estimate  of  A,  denoted  A + , by 
regressing  y*t  on  xt.  The  Chow  statistic  based  on  the  fully 
modified  estimators  has  the  chi-square  asymptotic 
distribution. 
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Schwert's  (1989)  simulations  suggested  that 
nonparametric  corrections  do  not  perform  well  even  in  large 
samples.  He  shows  that  for  some  (especially  MA)  error 
processes,  the  nonparametric  variants  of  the  Dickey-Fuller 
tests  are  characterized  by  low  power  and  incorrect  sizes,  in 
samples  as  large  as  1000,  and  that  the  use  of  the  ADF  test 
is  to  be  preferred  to  the  use  of  nonparametrically  corrected 
Dickey-Fuller  test  statistics. 


2.2.2.  Several  Tests 

Here  I will  review  Perron's  (1989)  testing  procedure 
for  a unit  root  hypothesis  with  drift  at  a known  break  point 
and  then  discuss  several  tests  with  an  unknown  break  point. 
Under  the  assumption  of  a single  structural  change  Zivot  and 
Andrews  (1990)  proposed  the  min-t  test  and  Banerjee, 
Lumsdaine  and  Stock  (1990)  suggested  the  recursive  and 
sequential  tests.  Against  the  general  unspecified  structural 
break  in  the  model  with  non-stationary  regressors  rather 
than  a single  structural  change  Hansen  (1990)  developed  the 
MeanChow  test.  Empirical  applications  of  these  testing 
procedures  will  be  reviewed  in  Chapter  4. 
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2. 2. 2.1.  Perron's  Hypothesis 


Perron  (1989)  developed  a procedure  for  testing  the 
null  hypothesis  that  a given  series  has  a unit  root  with 
drift  and  that  an  exogenous  structural  break  occurs  at  a 
known  time  a priori  versus  the  alternative  hypothesis  that 
the  series  is  stationary  about  a deterministic  time  trend 
with  an  exogenous  change  in  the  trend  function  at  a known 
time.  He  considered  three  model  as  follows: 

Model  (A)  yt  = pA  + QADUt  + pAt  + dAD(m)  t + aAyw  + eA 

Model  ( B ) yc  = + QBDUC  + PBfc  + yBDT*  + a Byc_±  + etB 

Model  (C)  yt  = \ic  + dcDUt  + pct  + y cDT*t  + dcD(m)  t + + etc 


where 


and 


D(m)  = 1, 
= 0, 


if  t = m +1 
otherwise 


if  t > m 
otherwise 


DT*t  = t -m,  if  t > m 
= 0,  otherwise 

He  assumed  that  the  innovation  series  is  the  ARMA (p, q) 

type  with  the  order  p and  q possibly  unknown.  The  null 
hypothesis  of  a unit  root  imposes  the  following 


restrictions : 
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The  crash  hypothesis: 

Model  ( A ) : aA  = 1,  PA  = 0,  0A  = 0 
The  breaking  slope  with  no  crash: 

Model  (B)  :aB  = 1,  yB  = 0,  pB  = 0 
Both  effects: 

Model  (C)  : ac  = 1,  yc  = 0 / Pc  = 0 
Under  the  alternative  hypothesis  of  a "trend  stationary" 
process,  we  expect  aA,  aB,  ac  < 1,  PA,  PB,  pc  = 0 and 
0A,  0C,  yB,  yc  = 0 . Furthermore,  under  the  alternative 
hypothesis  dA,  dc  and  0B  should  be  close  to  zero  while 
under  the  null  they  are  expected  to  be  significantly 
different  from  zero. 

Perron  derived  the  asymptotic  distribution  of  Tift1  - 1) 
where  i = A,  B,  C and  t ratio  tai  under  the  null  hypothesis  of 
a unit  root.  The  asymptotic  distributions  of  the  various 
statistics  are  not  influenced  by  the  introduction  of  the 
dummy  variables  D[m)  t since  it  is  a dummy  variable  affecting 
a single  period.  This  implies  that  only  two  sets  of  critical 
values  need  be  evaluated,  one  corresponding  to  model  (A)  and 
the  other  to  models  (B)  and  (C) . He  found  that  the  limiting 
distributions  depend  on  additional  nuisance  parameters, 
apart  from  r (the  break-point  ratio)  , o2  = lim  T^ElS^]  and 

T T 

o\  = lim  r'1£[Y'  e2t]  where  ST  = S'  et.  In  the  case  of  weakly 
r-»  ! i 

stationary  innovations  o2  is  equal  to  2TtJf(0)  where  f(0)  is 
the  spectral  density  of  lej  evaluated  at  frequency  zero  and 
o2e  is  the  variance  of  the  innovations.  When  the  innovation 
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sequence  le^)  is  independent  and  identically  distributed 
o2  = Og,  the  limiting  distribution  are  invariant  with 
respect  to  nuisance  parameters,  except  r.  Perron  tabulated 
the  critical  values  of  the  limiting  distribution  for  given 
value  of  r by  5,000  times  simulation  methods.  The  critical 
values  under  the  various  models  are  significantly  larger 
than  the  standard  Dickey-Fuller  critical  values. 


2. 2. 2. 2.  Zivot  and  Andrews'  Min-t  Test 


Zivot  and  Andrews  (1990)  questioned  Perron's  exogeneity 
assumption  of  r and  instead  treat  the  structural  break  as  an 
endogenous  occurrence.  Their  null  hypothesis  for  the 
Perron's  three  models  is  an  integrated  process  with  drift, 

Vt  = H + Vt-i  + et 

The  selection  of  the  break  point,  m , is  the  outcome  of  an 
estimation  procedure  designed  to  fit  the  series  to  a certain 
trend  stationary  representation.  They  suggested  to  choose  m 
that  gives  the  least  favorable  result  for  the  null 
hypothesis.  That  is,  m is  chosen  to  minimize  the  one-sided 
t-statistics  for  testing  ai  = 1 , when  small  values  of  the 
statistic  lead  to  rejection  of  the  null.  Zivot  and  Andrews 
proposed  the  test  statistic 

tai  (fiinf)  = Inf  t&i  (m) 

meD 

where  D is  a specified  closed  subset  of  (0,1). 
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Zivot  and  Andrews  derived  the  limiting  distributions  of 
this  statistic  when  the  disturbances  are  independent  and 
there  are  no  extra  lag  terms  in  the  regression  equations, 

i i 


Inf  t&i  - Inf  [fwi(m,s)2ds]-1/2[fwi(m,s)dw(s)]  as  r - « 

meD  meD  J JQ 

As  discussed  in  section  2.3.1  if  the  disturbances  are 
allowed  to  be  correlated  and  heterogeneously  distributed, 
then  the  limiting  distribution  depends  on  the  nuisance 
parameters.  They  followed  the  ADF  approach  as  Perron  used. 
The  errors  are  assumed  to  be  restricted  to  the  class  of 
ARMA (p , q)  processes  and  the  extra  lags  of  first  differences 
of  the  series  are  added  as  regressors.  Then  the  test 
statistics  computed  from  the  ADF  regression  equations  has 
the  limiting  distribution  free  of  nuisance  parameter 
dependencies.  They  tabulated  critical  values  for  the 
limiting  distributions  by  simulation  methods.  The  integral 
functions  are  approximated  by  functions  of  sums  or  partial 
sums  of  independent  normal  random  variables. 


Table  5 

Critical  values  for 
the  asymptotic  distribution  of  min-t 


Model 

a=l% 

5% 

10% 

A 

-5.34 

-4.80 

-4.58 

B 

-4.93 

-4.42 

-4.11 

C 

-5.57 

-5.08 

-4.82 
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2 . 2 . 2 . 3 . Hansen 1 s MeanChow  Test 


In  section  2.1.3  we  discussed  Hansen's  MeanChow  test 
compared  with  MaxChow  test.  Hansen  (1990)  extended  the 
MeanChow  test  to  the  model  with  the  non-stationary 
regressors.  As  mentioned  in  section  2.3.1  Hansen  follows  the 
nonparametric  transformation  approach  to  get  the  nuisance 
parameter  free  asymptotic  distribution  of  the  MeanChow  test 
statistic.  He  developed  a two  stage  method  for  estimating 
the  fully  modified  estimator  which  served  the  basis  to 
eliminate  the  parameter  dependency  from  the  limiting 
distribution. 

Consider  the  multiple  regression  model 
y t = Alcaic  + ^2tx2t  + et 

where  xlt  is  the  set  of  stationary  regressors  and  x2t  is  the 
set  of  non-stationary  regressors.  His  two  stage  method  for 
the  fully  modified  estimator  is  that  first  estimate  Ax  by 
OLS,  and  then  modify  the  dependent  variable 


Vt  = Vt  ~ Aixit  - 

where  the  submatrix  of  Q is  defined  in  section  2.3.1  and 

M 

r)t  = ^ a (j  ,M)  t?2c+J  with  a(j,M)  , a sequence  of  fixed  weights. 

j=i 

Hansen  suggested  to  use  the  triangular  window  for  the 
weights 

a(j,M)  = (1  - -J-)  (%) 

M+l  M 

The  fully  modified  estimate  of  A2 , denoted  by  A2  , is 
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obtained  by  regressing  y £ on  x2t.  The  natural  Chow  test  of  a 
structural  break  at  a known  time  [ Tr ] would  be  given  by  the 
OLS  regression 

y*t  = ^x2t  + t2zx2tI(t<.  [Tr]  ) + et 


and  the  test  statistic 

ChOW2(r)  = ?2rV2~r?2r 

where  *(2z  is  the  vector  of  f2r.  Hansen  derived  that  the 
limiting  distribution  of  the  Chow  statistic  for  any  fixed  r 
follows  Xn2(n2*i)  where  n2  is  the  number  of  observations  of 
non-stationary  regressor  and  n3  is  the  number  of  equations 
in  the  system. 

Hansen  chooses  the  mean  which  maps  the  Chow  process 
into  the  real  line  on  the  range  [0.15,0.85]  which  is 
suggested  by  Andrews  (1989) . Similarly  as  shown  in  section 
2.1.3  the  MeanChow  test  of  the  model  with  non-stationary 
regressors  is 

i r2 

MeanChow  = i £ C(j/T)  , Tx  = .15 T,  T2  = .85 T 

'^‘2  -Tl  j'Ti 

The  limit  distribution  of  MeanChow  is  by  the  continuous 
mapping  theorem 

MeanChow  =*  fc(r)  dr/ J dr 

K K 

Hansen  tabulated  the  asymptotic  critical  values  of  the 
MeanChow  statistics  by  Monte  Carlo  method. 
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2. 2. 2. 4.  Banerjee.  Lumsdaine  and  Stock's  Test 

Banerjee,  Lumsdaine  and  Stock  (1990)  proposed  the 
recursive  tests  for  a unit  root  and  the  sequential  tests  for 
changes  in  coefficients.  Their  motivation  for  considering 
recursive  unit  root  tests  is  that  the  process  might  be  well 
approximated  as  having  a unit  root  over  part  of  the  sample 
but  not  over  another  part.  The  sequential  tests  are 
constructed  for  testing  a unit  root  against  some  trend- 
shift/mean-shift  alternatives. 

First  Banerjee  et  al.  considered  the  model 

Model  I:  yt  = \ia  + |i1t  + ayt_1  + P(L)Ayc_1  + et 

where  P ( L ) is  a lag  polynomial  of  known  order  p with  the 
roots  of  1 - P (L)  L outside  the  unit  circle.  Under  the  null 
hypothesis  a=l  and  |a1  = 0.  The  t-statistic  testing  the 
hypothesis  that  <x=l  in  Model  I,  computed  over  the  full 
sample  of  T observations,  is  the  standard  Dickey-Fuller 
(1979)  t-statistic  for  testing  for  a unit  root,  including  a 
constant  and  a time  trend  in  the  regression.  Banerjee  et  al. 
developed  the  asymptotic  theory  for  the  recursively  computed 
estimators  and  t-statistics . They  found  that  the  recursive 
estimation  of  the  nuisance  parameters  does  not  affect  the 
asymptotic  distribution  of  the  recursive  Dickey-Fuller 
statistic  based  on  the  block  diagonality  of  the  covariance 
matrix  of  the  transformed  regressors  and  the  limiting 
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distribution  of  the  modified  recursive  Sargan-Bhargava 
statistic.  They  examined  five  statistics  for  recursive  tests 
for  unit  roots  (five  different  mapping  from  recursive  D-F 
and  S-B  processes  to  real  line) : 1)  the  full-sample  D-F 
statistic,  tDF  2)  the  maximal  D-F  statistics,  3)  the 

minimal  D-F  statistic,  t££n  4)  tj£ff  = - t*J;n  5)  the  minimal 

value  of  the  recursive  modified  S-B  statistic,  i?min . They 
tabulated  the  critical  values  of  each  statistic  by  2,000 
Monte  Carlo  simulation. 


Table  6 

Critical  values  for  Recursive  Unit  Root  Tests:  10%  (5%) 


T 

j.  max 
l'DF 

min 

UDF 

diff 

lDF 

j^min 

100 

-3.15 

-1.93 

-3.88 

2.95 

.0195 

(-3.45) 

(-2.21) 

(-4.13) 

(3.37) 

( .0165) 

250 

-3 . 13 

-1.88 

-3.80 

2.98 

.0199 

(-3.43) 

(-2.14) 

(-4.07) 

(3.36) 

( .0170) 

500 

-3 . 13 

-1.88 

-3.82 

3.01 

. 0198 

(-3.42) 

(-2.14) 

(-4.10) 

(3.45) 

( .0173) 

Note  that  critical  values  of  fc££n  are  different  from  those 
of  Zivot  and  Andrews.  They  reported  Monte  Carlo  results  that 
no  single  recursive  statistic  seemed  to  provide  a reliable 
test  against  root  shifts  alternative. 

Banerjee  et  al.  constructed  the  sequential  tests  for 
shift  or  jump  in  trend  at  an  unknown  point.  The  model 
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considered  is 

Model  II:  yt  = \i0  + \k1t(m)  +\)>2t  + a.yt._x  + $ {L)  kyt_1  + w,xt_1{m)  +et 

The  deterministic  regressor  t(.m)  captures  the  possibility 
of  a shift  in  trend,  t{m)  =t-m  ( t>  m ) , or  jump  in  the  trend, 
t(m)  =1  (t > m)  , at  period  m.  The  sequential  statistics  are 
computed  using  the  full  sample,  sequentially  incrementing 
the  date  of  the  hypothetical  break,  m.  They  show  that  the 
asymptotic  distribution  of  the  sequential  t-statistic  does 
not  depend  on  the  nuisance  parameters  because  the  covariance 
matrix  of  transformed  regressors  is  block  diagonal.  Three 
sequential  statistics  are  examined:  1)  the  maximum  of  the 
sequential  F-statistics , testing  the  hypothesis  that 

|a1  = 0 2)  the  sequential  D-F  statistics  evaluated  at  the 
value  of  m that  maximizes  FT  and  tDF  3)  the  minimal  D-F 
statistic  over  all  the  sequentially  computed  D-F  statistics, 
t££n* . Table  7 and  8 show  the  asymptotic  critical  values  of 
the  three  statistics  for  the  case  of  the  trend-shift  and  the 
mean-shift. 

They  found  by  Monte  Carlo  studies  that  the  size  of  the 
tests  is  approximately  the  level  of  the  tests.  However  the 
power  of  these  sequential  tests  shows  that  the  trend  break 
is  detected  with  high  probability  (the  unit  root  tests 
reject  with  high  probability) . The  full-sample  standard  D-F 
statistic  fails  to  detect  stationarity  around  a shifting 
trend,  particularly  if  the  break  is  in  the  second  half  of 
the  sample.  This  confirmed  Perron's  (1989)  results  and 
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Table  7 

Critical  values  of  the  sequential  statistics 
Trend  shift:  10%  (5%) 


T 

t max 

ft 

top*111*) 

j.  mi  n* 
lDF 

100 

14.30 

-4.20 

-4.20 

(16.74) 

(-4.51) 

(-4.51) 

250 

12.96 

-4.10 

-4.11 

(15.69) 

(-4.41) 

(-4.42) 

500 

13.20 

-4.09 

-4.11 

(15.29) 

(-4.38) 

(-4.38) 

Table  8 

Critical  values  of  the  sequential  statistics 
Mean  shift:  10%  (5%) 


T 

ri  max 

ft 

tDF(m*) 

j_  mi  n* 
UDF 

100 

15.91 

-4.51 

-4 . 52 

(18.40) 

(-4.82) 

(-4.83) 

250 

16.42 

-4.49 

-4.51 

(18.61) 

(-4.75) 

(-4.75) 

500 

16.70 

-4.53 

-4.55 

(19.03) 

(-4.79) 

(-4.81) 
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interpretation;  the  permanent  shift  in  the  deterministic 
trend  is  mistaken  for  a persistent  innovation  to  a 
stochastic  trend. 

Note  that  Banerjee  et  al . 1 s critical  values  of  min-t 
statistic  are  different  from  those  of  Zivot  and  Andrews 
(1990) . Banerjee  et  al.  derived  the  asymptotic  theory  for 
the  recursive  D-F  statistics  and  S-B  statistics  and  the 
sequential  D-F  statistics,  but  did  not  derive  the  exact  form 
of  the  asymptotic  distribution  of  a specific  mapping  from 
the  processes  of  those  statistics  to  the  real  line,  for 
example,  max,  min  or  mean  of  those  statistics  (step  4 in 
section  2.1.1).  This  may  cause  the  different  critical  values 
from  those  of  others. 


CHAPTER  3 


THE  BAYESIAN  APPROACH 


We  begin  our  discussion  of  Bayesian  inference  for  a 
structural  change  by  deriving  marginal  posterior 
distributions  of  parameters.  Throughout  this  chapter  I will 
assume  the  flat  prior  distribution  of  parameters  including 
an  unknown  break  point  m based  on  that  the  parameters  are 
equally  likely  at  any  observation  point.  This  simple  flat- 
prior  Bayesian  inference  will  be  discussed  in  detail  later. 
Section  3.2  derives  posterior  distributions  of  parameters 
under  the  flat  prior.  Section  3.3  reports  some  Monte  Carlo 
results  on  the  detection  of  a break  point  that  the 
structural  change  is  easier  to  detect  when  the  series  is 
explosive. 


3.1  Posterior  Distributions  from  Flat  Prior 


Consider  the  simple  two-phase  regression  model. 


y±  = “i  + PA  + ti' 

i=l,  . . . , m 

y±  = «2  + P2*i  + ei' 

i=m+ 1, . . . , T 

Under  the  assumption  of  independent  normal  errors,  the 
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likelihood  function  is 

, « T 

L = (2txo2)  -t/2  exp[ — ^ (E  (yi-a1-p1Xi)2+  £ (y.*  - a2  - p2^) 2 ) ] 

2o  i= i i=m+l 

In  this  simple  case  Holbert  (1982)  derived  the  posterior 
density  of  the  change  m under  the  assumption  of  vague  prior 
densities  as  follows: 

Pa( a1,P1,a2,p2)  oc  constant,  -~<ai/pi<<» 

Pg(m)  = 1/  (m-3 ) , m=2  , , T-2 

P0(o2)  « l/o2,  0<o2<«> 

Combining  the  prior  and  likelihood  we  obtain  the  joint 
posterior  density  of  the  parameters 

P1{m,  o2 , ax,  P1,a2,P2)  « (a2) '(r/2)+1exp  [ — i-(E  (y*  - - P^)2 

2o2  fzi 

+ E (yi  - “2  - P2^)2)  ] 

l=m+l 

Integration  on  a2,  alt  a2,  P1#  P2  leads  to  the  posterior 
density  of  the  change  point  m,  giving 

m T 

P1  (m\data)  <*  [miT-rn)^  (Xd-Xlim)2  E (^-^i,t>2]'1/2 

2=1  2 -777+1 

m T 

tE  <yi-^i(i.i»))2+  E <yi-3?i(m.i,r))2]'(T‘4)/2^=2 t-2 

i= 1 i=m+l 

J 

where  = (E*;>  / (i"^+D  and  = a (jc<  + p (Jt  1)xi . 

i=k 

Holbert  found  that  the  marginal  posterior  density  of  the 
regression  coefficients  was  a mixture  of  bivariate  t 
densities  where  the  mixing  distribution  was  the  marginal 
posterior  mass  function  of  m. 

Let  us  consider  the  standard  multiple  regression  model 
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with  a single  structural  change, 


y = W0  + e,  where  W = 


X1  0 
0 X , 


0 = 


0i 

0, 


e = 


Suppose  that  ei  is  independently  and  normally  distributed 


with  a zero  mean  vector  and  the  unknown  variance  o2 . Then 
the  likelihood  function  for  m,  0,  and  a2  is  given  by 

£(0,,©,,  a2  ,w\y,  W)  « (o2)  _r/2  exp  [ (y  - W0)  '(y  - W0)  ] 

We  assume  that  the  change  point  m is  independent  of  0 and 
a2 , and  m is  equally  likely  at  any  observation  point 
between  k and  T-k.  We  use  the  flat  prior  on  0 and  o2  , and 
thus  the  joint  prior  is  given  by 

Po(jn,0,  a2)  « l/cr2 

Combining  with  the  likelihood  function  we  have  the  joint 
posterior  density  function 

P,  (0,  a2 ,m\y,  W)  « (CT2) -<r/2>+1  exp  [-- {S  + (0-0)  W(0  - 9)  } ] 

2o2 

Integrating  out  0 and  o2  yields  the  marginal  posterior  mass 
function 


P1  (m\y,  W)  « \w'w\-1/2\s\-{T-2k)/2 
where  \w'w\  = \x[x^  \ \ x£x2  | and  S = S1+S2  with 
Si  = {y1-X1Qi),(y1-XiQ1)  and  Qi=  (X^^  ~1X,iyi.  The  marginal 


posterior  density  function  is  obtained  by 

T-k 

P1(0|y,  W)  « Y,  pi  (0N*y#  W)  Px  (m|y,  IV) 

m=k 

where  Px  (0 1^?,  y,  is  the  conditional  marginal  posterior 
density  function  of  0 given  m and  it  is  given  by  a matrix  t 
density 
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P1(B\m,y,W)  « [j  + (0  - fl)'ff(0  - 0)  ]-(**2*-d/2 
where  v = T - 2k  + 1 and  //  = w'w/S . The  marginal  posterior 
densities  of  0 and  o2  are  mixtures  of  matrix  t densities 
and  the  mixing  proportion  is  the  marginal  posterior 
probability  mass  function  of  the  joint  function.  The 
marginal  posterior  density  function  of  0,  is 

T-k 

P1(Qi\y,W)  oc  £ P1(Qi\inlytW)  P1(m\y,W) 

m=k 

where  Pl(di\m,y,W)  is  the  conditional  marginal  posterior 
density  of  0i  given  m,  and  it  is  a univariate  t 
distribution 


P1{Qi\m,y,W)  « 


1 + 


(0i-ei)2l-(v+1)/2 


where  h ii  is  the  ith  diagonal  element  of  H'1 . The  marginal 
posterior  density  of  P,  (0i|^3,y/  W)  is  a weighted  t 
distribution  with  weights  being  the  posterior  probability 
mass  function  of  m. 

We  are  interested  in  the  conditional  marginal 
probability  mass  function  of  m given  a subset  of  parameters, 
p,  = p2=l.  We  can  partition  0 = [0^  j 0^]  where  0l=(p,,p2),  and  H 
is  partitioned  to  conform  with  the  partitioning  of  0 . 
Integrating  out  o2  from  the  joint  posterior  density  of  the 
parameters  yields 


P,(0„02,jn|y,  W)  « [ J+  (0, -0,) '(//„  -H12H^H21)  (0,-0,) 

+ { <e2  -02)  + h£h21  (0,-0,)  } 'H22  { (02  -02)  +h£h21  (0,  -0,)  } ] 'T'2 

The  conditional  marginal  probability  mass  function  of  m 


given  0,  can  be  obtained  by  integrating  out  02 
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P1(m\Q1,y,W)  “ [ 1 + (0! -0].)  7 (#11 -K\2H22H21> 
the  number  of  elements  in  02 • 


■(T-k2)/  2 


where  ic2  is 


3.2  A Monte  Carlo  Study 

In  this  section  I will  report  some  Monte  Carlo  results 
about  the  detection  of  an  unknown  break  point.  The  results 
show  that  the  structural  change  is  well  detected  in  both 
stationary  and  nonstationary  cases  by  the  marginal  posterior 
distribution  of  mass  function  of  m . 

Consider  the  simple  AR(1)  as  follow: 

yt  = ai  + piyt_i  + et,  t = l , . . . ,m 

yt=a2  + p2yt-i  + et'  t=m+i, . . . , t 

The  derivations  of  marginal  and  conditional  posterior 
distributions  of  parameters  are  given  in  section  3.2. 

The  behavior  of  the  marginal  posterior  mass  function  of  m 
is  evaluated  by  the  simulations  fixing  the  values  of  two 
a's  at  2 and  4 and  p's  at  0.7,  0.95  and  1 and  0 .1  TzirnzO .9 T 
from  100  replications.  Posterior  distributions  of  m varied 
much  in  each  iteration  of  replications.  To  get  the  overall 
view  about  the  behavior  of  the  distribution  we  averaged  each 
value  on  the  distributions  over  100  replications.  Thus  the 
distribution  being  illustrated  in  the  following  figures  is 
not  the  exact  distribution  but  the  average  behavior  of  that 


in  the  simulation. 
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Figure  1 shows  that  the  marginal  posterior  mass 
function  of  m has  its  peak  at  the  break  point  when  samples 
are  drawn  by  fixing  the  break  point  at  m*  =25  and  the 
parameters  at  a1=2,  a2=4,  p1=p2=l  and  T = 50.  The 
conditional  posterior  distribution  of  m given  p1  = p2=l  turns 
out  to  be  smoother  than  the  marginal  distribution.  It  means 
that  the  numbers  of  detecting  the  break  point  at  m" = 25  by 
the  conditional  distribution  are  less  than  those  by  the 
marginal  distribution. 

Figure  2 illustrates  the  marginal  and  conditional 
distributions  of  m when  random  samples  do  not  have  a break 
by  fixing  ax=a2=2  and  p1=p2=l.  The  conditional  distribution 
is  flat  over  the  sample  period,  but  the  marginal 
distribution  is  flat  within  the  sample  period  and  high  at 
the  beginning  and  end  of  the  sample. 

For  stationary  series  (p1=p2=0.7)  Figure  3 and  4 plots 
the  marginal  and  conditional  posterior  distributions  of  m. 
The  marginal  distribution  has  a sharp  peak  at  the  break 
point  in  Figure  4.  The  conditional  posterior  distribution, 
however,  does  not  have  a peak  around  the  break  point.  The 
numbers  of  detecting  the  break  point  by  the  conditional 
posterior  distribution  of  m given  p^p^O.7  are  reduced  in 
stationary  case,  but  those  by  the  marginal  posterior 
distribution  are  not  changed  (even  increased) . 


Probab 1 I 1 ty 
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Figure  1 

Marginal  Posterior  Mass  Function  of  m 
m*=25  and  rho=l 
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Figure  2 

Marginal  Posterior  Mass  Function  of  m 
No  Break  and  rho=l 


48 


Per  l od 


Figure  3 

Marginal  Posterior  Mass  Function  of  m 
No  Break  and  rho=0.7 
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As  the  sample  size  becomes  bigger,  both  the  marginal 
and  conditional  posterior  distributions  detect  well 
regardless  of  the  stationary  and  nonstationary  case.  Figure 
4 and  5 illustrate  those  distributions  of  nonstationary 
(Pi=p2=l;  unit  root)  and  stationary  case  (px=pz=0.7), 
respectively,  when  T=100.  Note  that  the  peak  probability  of 
the  marginal  distribution  in  stationary  case  is  higher  than 
that  of  nonstationary  case. 

When  the  autoregressive  parameter  p is  changed,  from 
stationary  to  nonstationary,  the  marginal  posterior 
distribution  of  m detected  the  break  point  very  well.  The 
conditional  posterior  distribution  of  m given  p1=0.95(0.7) 
and  p2=l  appeared  to  have  a flat  low  probability  at  period 
1 and  a high  probability  at  period  2.  Figure  6 and  7 show 
the  cases  of  px=0.7,  p2=l  and  px  = 0.95,  p2  = l , respectively. 

When  the  series  is  characterized  from  nonstationary  to 
stationary,  then  the  marginal  posterior  distribution  detects 
the  break  point  well.  However,  asymmetrically  the 
conditional  posterior  distribution  does  not  show  any 
systematic  behavior.  Figure  8 and  9 show  the  reverse  case  of 
figure  6 and  7,  respectively. 

Until  now  we  assumed  that  a structural  break  occurred 
at  the  middle  of  the  sample  period.  Figure  10  shows  the  six 
cases  where  the  break  occurs  at  0.15T,  0.2T,  0.4T,  0.6T, 
0.8T,  and  0.85T  with  a1= 2,  a2=4,  p1=p2=l  and  T = 50, 
respectively.  Each  peak  indicates  the  assumed 
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Figure  4 

Marginal  Posterior  Mass  Function  of  m 
m*=25  and  rho=0 . 7 
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Figure  5 

Marginal  Posterior  Mass  Function  of  m 
m*=50  and  rho=0.7 
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Figure  6 

Marginal  Posterior  Mass  Function  of  m 
nT=25,  rhol=0.7  and  rho2=l 
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Figure  7 

Marginal  Posterior  Mass  Function  of  m 
m*=25,  rhol=0.95  and  rho2=l 
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Figure  8 

Marginal  Posterior  Mass  Function  of  m 
m*=25,  rhol=l  and  rho2=0.7 
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Figure  9 

Marginal  Posterior  Mass  Function  of  m 
m*=25,  rhol=l  and  rho2=0.95 
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Figure  10 

Marginal  Posterior  Function  of  m 
m*=8 , 10, 20, 30, 40  and  43  and  rho=l 


57 


break  point  exactly.  From  figure  10  we  can  infer  that  the 
marginal  posterior  mass  function  of  m detects  the  structural 
break  sharply  whenever  a break  occurs  within  the  sample 
period  of  0 .lT<,m<.0 .9  T . 

What  will  happen  if  a break  occurs  at  the  beginning  and 
end  of  sample  period,  m*  = 0.1T  or  m*  = 0.9T ? Figure  11  shows 
those  cases.  Peaks  are  found  at  the  beginning  and  the  end  of 
the  sample  period  as  we  expected.  Comparing  figure  11  with 
figures  2 and  3 indicates  that  it  is  hard  to  infer  about  the 
break  point  when  a break  occurs  at  the  beginning  or  the  end 
of  the  sample  period. 

In  summary  the  Monte  Carlo  experiment  shows  the 
following: 

1)  From  the  average  behavior  of  the  marginal  posterior 
distribution  of  m we  can  identify  a break  point  as  the  peak 
of  the  marginal  posterior  distribution  of  m within  the 
sample  period,  0 . 1 T<,m<.  0 . 9 T. 

2)  The  marginal  posterior  distribution  of  m detects  a break 
point  well  regardless  of  the  stationary  and  nonstationary 
case . 

3)  When  the  constant  is  changed,  the  marginal  posterior 
distribution  of  m detects  the  break  point  sharply,  however 
the  conditional  posterior  distribution  of  m given  p's  is 
not  as  good  as  the  marginal  posterior  distribution  of  m for 
detecting  the  break  point,  especially  in  the  stationary 


case . 
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Figure  11 

Marginal  Posterior  Mass  Function  of  m 
m*=5,45  and  rho=l 


59 


4)  When  the  autoregressive  parameter  is  changed,  the 
marginal  posterior  distribution  of  m detects  the  break  point 
well.  The  conditional  posterior  distribution  of  m shows 
higher  probabilities  at  nonstationary  period  than  stationary 
period  when  the  series  is  changed  from  stationary  to 
nonstationary,  but  asymmetrically  when  the  series  is  changed 
from  nonstationary  to  stationary,  it  does  not  show  a 
systematic  behavior. 


CHAPTER  4 


DATA  SETS  USED  AND  PREVIOUS  RESULTS 


Under  the  unit  root  hypothesis  random  shocks  have  a 
permanent  effect  on  the  economy  and  fluctuations  are  not 
transitory.  A series  of  empirical  analyses  followed  by 
Nelson  and  Plosser  (1982)  basically  confirmed  that  most 
macroeconomic  variables  have  a univariate  time  series 
structure  with  a unit  root.  On  the  statistical  front, 
besides  the  standard  unit  root  tests  proposed  by  Dickey  and 
Fuller  (1979)  and  Fuller  (1976)  there  emerged  several 
alternative  approaches  to  test  the  unit  root  hypothesis,  for 
example,  Phillips  and  Perron  (1986) , Campbell  and  Mankiw 
(1987,  1988)  and  Cochran  (1986).  Empirical  applications  of 
these  methodologies  generally  reaffirmed  the  conclusion  that 
most  macroeconomic  time  series  have  a unit  root. 

Recently  Perron  (1989)  proposed  modifying  the  standard 
Dickey-Fuller  test  by  including  dummy  variables  in  the 
Dickey-Fuller  regression  in  order  to  allow  for  a break  in 
the  trend  and  mean.  He  postulated  that  the  1929  Great  Crash 
and  the  1973  oil  price  shocks  were  not  a realization  of  the 
underlying  data-generating  mechanism  of  the  various  series 
but  could  be  modelled  as  exogenous.  Then  he  computed 
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critical  values  appropriate  to  this  modified  regression  and, 
using  the  new  critical  values,  found  in  favor  of  a 
structural  break  in  a majority  of  the  time  series 
investigated  by  Nelson  and  Plosser  (1982).  Perron  concluded 
that  most  macroeconomic  time  series,  11  out  of  the  14  series 
analyzed  by  Nelson  and  Plosser,  are  not  characterized  by  the 
presence  of  a unit  root  and  that  fluctuations  are  indeed 
transitory . 

Perron's  basic  idea  is  that  tests  for  a unit  root  using 
the  full  sample  are  biased  in  favor  of  accepting  the  unit 
root  hypothesis  if  the  series  has  a structural  break  at  some 
intermediate  date.  An  important  criticism  of  the  Perron 
approach  raised  by  Christiano  (1988)  was  that  the  break  date 
was  assumed  to  be  known.  It  is  more  reasonable  not  to  assume 
a priori  knowledge  of  the  break  date  but  rather  to  allow  its 
estimation  to  be  part  of  the  empirical  exercise.  This  idea 
initiated  the  unit  root  test  procedures  using  sub-samples. 
Examples  include:  the  recursive  test  procedures  which  uses  a 
seguence  of  statistics  constructed  by  incrementing  the 
sample  (Banerjee,  Dolado  and  Galbraith,  1990;  Banerjee, 
Lumsdaine  and  Stock,  1990;  Zivot  and  Andrew,  1990;  Hansen, 
1990)  and  the  seguential  test  procedures  which  use  a 
seguence  of  statistics  computed  by  using  the  full  sample  at 
each  stage  varying  dates  for  the  break  (Christiano,  1988; 
Banerjee,  Lumsdaine  and  Stock,  1990) . The  empirical  results 
by  applying  these  test  procedures  to  the  log  of  U.S.  real 
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GNP  provided  no  evidence  against  the  unit  root  null  in 
contrast  to  Perron. 

Christiano  (1988)  criticized  Perron's  a priori 
knowledge  of  a break  point  and  argued  that  selecting  the 
break  date  should  be  a function  of  the  data.  He  showed  that 
the  computed  statistical  significance  level  can  be 
drastically  different  according  to  the  choice  of  break  date. 
The  researcher's  break  date  selection  algorithm  could  then 
be  used  to  compute  the  significance  level  of  the  test 
statistic  for  a break.  He  suggested  the  maximal  F and 
minimum  significance  level  techniques  as  a pre-test 
examination  of  the  data.  Christiano  reported  that  a variety 
of  test  statistics  reveals  no  evidence  against  the  null 
hypothesis  of  no  trend  break  in  post-war  Nelson  and 
Plosser's  U.S.  GNP  data  using  the  adjusted  critical  value  by 
the  various  break  date  selection  procedures. 

Banerjee,  Dolado  and  Galbraith  (1990)  applied  the 
recursive  minimum  Dickey-Fuller  t-test  for  unit  root  null 
and  sequential  Dickey-Fuller  t-test  for  the  alternative  of  a 
trend  break  proposed  by  Banerjee,  Lumsdaine  and  Stock  (1990) 
to  the  Friedman  and  Schwartz  (F-S)  (1982)  annual  series  from 

1869  to  1975  and  Nelson  and  Plosser  (N-P)  (1982)  annual 

series  from  1909  to  1970.  Using  the  full  sample  they  could 
reject  the  null  hypothesis  of  a unit  root  for  the  F-S 
series,  but  not  for  the  shorter  N-P  series.  For  Perron  type 
critics  of  using  full  sample  in  unit  root  tests,  they 
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computed  the  recursive  minimum  D-F  t-statistics  of  two 
series  by  using  incrementing  sub-sample  and  compared  those 
with  the  critical  values  in  Banerjee,  Lumsdaine  and  Stock 
(1990).  The  recursive  test  statistics  come  close  to 
rejecting  the  unit  root  null.  For  the  alternative  of  a trend 
break  against  the  null  of  no  break  in  each  of  the  series  the 
sequential  statistics  offered  no  significant  evidence  of  a 
break.  These  results  parallel  Christiano's  (1988).  They  also 
reported  that  the  fact  that  the  normality  hypothesis  of  F-S 
series'  residuals  was  rejected  might  cause  the  application 
of  those  critical  values  hazardous. 

Banerjee,  Lumsdaine,  and  Stock  (1990)  applied  several 
recursive  tests  and  sequential  tests  to  data  on  postwar  real 
output  for  seven  OECD  countries.  For  OECD  data  the  following 
description  of  data  sets  comes  from  the  Appendix  B of 
Banerjee,  Lumsdaine,  and  Stock  (1990) . 

Data  for  the  United  States  are  GNP  from  Citibase,  for 
1947:1  to  1989:2.  The  data  for  the  six  countries  come  from 
two  sources,  the  OECD  Main  Economic  Indicators  database 
maintained  by  Data  Resources,  Inc.  (DRI) , and  Moore  and 
Moore  (1985).  In  most  cases,  two  series  have  been  spliced 
together  to  construct  a longer  time  series  of  data.  Where 
this  has  involved  an  adjustment  because  the  real  series  are 
indexed  to  different  base  years,  they  have  been  adjusted 
using  the  earliest  available  ratio  of  the  two  series. 
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The  Canada  data  are  GNP,  with  1948:1  to  1960:4  from 
Moore  and  Moore  and  1961:1  to  1989:2  from  DRI . The  France 
data  are  GDP,  1963:1  to  1989:2,  and  are  from  DRI.  The  French 
data  contain  a large  negative  spike  (a  strike)  in  1968:2;  we 
eliminated  this  spike  by  linearly  interpolating  the  value 
for  this  quarter.  The  data  for  Germany  are  GNP,  with  1950:1 
to  1959:4  from  Moore  and  Moore  and  1960:1  to  1989:2  from 
DRI.  The  data  for  Italy  from  DRI  were  nominal  rates,  so  we 
have  used  GDP  from  Moore  and  Moore  for  1952:1  to  1982:4.  The 
GNP  data  for  Japan  is  from  Moore  and  Moore  for  1952:1  to 
1964:4  and  from  DRI  for  1965:1  to  1989:2.  The  data  for  the 
UK  are  GDP  at  Factor  Cost  and  are  from  DRI  for  1960:1  to 
1989:2. 

They  first  computed  the  standard  Dickey-Fuller 
statistics  and  found  the  hypothesis  of  one  unit  root  could 
not  be  rejected  at  the  25%  level  for  each  of  the  seven 
countries.  As  Perron  (1989)  emphasized  that  if  the  trend- 
shift/stationary  model  is  correct,  then  conventional  unit 
root  test  statistics  will  incorrectly  fail  to  reject  the 
unit  root  null.  They  proposed  two  types  of  recursive  tests 
using  incrementing  sub-sample,  the  Dickey-Fuller  t- 
statistics  and  the  modified  Sargan-Bhargava  statistics.  For 
all  countries  but  Italy,  the  recursive  statistics  provide  no 
evidence  against  the  unit  root  null.  For  Italy  the  modified 
Sargan-Bhargava  statistics  show  the  evidence  against  the 
unit  root  null,  but  they  reported  the  poor  size  performance 
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of  that  test  statistic.  For  the  alternatives  of  a trend 
break  Banerjee,  Lumsdaine  and  Stock  constructed  three 
different  sequential  tests  for  three  different 
alternatives  — trend-shift,  mean-break,  and  a broken 
drift  — against  the  unit  root  null. 

Banerjee,  Lumsdaine  and  Stock's  results  for  the  USA 
indicate  no  rejections  of  the  unit  root  null  against  any  of 
the  various  hypotheses.  The  sequential  test  statistics 
indicate  that  for  Canada,  the  unit  root  null  is  rejected 
against  the  mean-shift/stationary  alternative,  with  the 
break  in  1981:3.  Thus  they  interpreted  that  the  recession  of 
the  early  1980's  is  represented  as  a permanent  downward 
shift  in  trend  growth;  after  the  recovery,  output  again  is 
stationary  along  its  original  growth  path.  In  1968:2, 

France  experienced  a major  strike.  They  used  two  sets  of 
data,  original  data  and  interpolated  data  over  that  period, 
and  reported  that  the  results  were  not  changed  regardless  of 
the  data  sets.  France  seems  better  characterized  as  being 
integrated,  but  the  reduction  in  the  rate  of  growth  of 
output  appears  around  1974.  For  Germany  their  findings 
indicated  that  the  unit  root  null  was  not  rejected,  but  the 
constant-drift/unit-root  null  in  favor  of  the  mean- 
shift/unit-root  alternative  was  less  strong.  However,  this 
may  come  from  the  fact  that  the  earliest  observations,  near 
the  end  of  World  War  II,  might  have  unusually  large 
measurement  error . For  Italy  there  is  some  evidence  in  favor 
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of  the  mean-shift/unit-root  alternative  around  the  time  of 
the  first  oil  shock  against  the  constant-drift/unit-root 
null.  For  Japan  the  unit  root  null  is  rejected  against  the 
trend-shift/stationary  alternative,  with  the  break  in 
1969:4,  but  on  the  other  hand  the  broken-drift/unit-root 
alternative  at  1973:2  was  favored.  Note  that  Banerjee, 
Lumsdaine,  Stock's  results  for  Japan  imply  two  possible 
hypotheses;  the  trend-shift/stationary  model  with  the  break 
in  1969:4  or  the  broken-drift/unit-root  model  with  the  break 
in  1973:2.  For  UK  the  results  provide  no  evidence  against 
the  unit  root  null.  They  also  detected  that  the  growth  rate 
increased  in  the  80' s,  although  not  significantly  so. 

Zivot  and  Andrews  (1990)  proposed  the  minimum  D-F  t- 
test  which  endogenized  the  break  point  selection  procedure 
and  reanalyzed  the  data  series  considered  by  Perron  using 
this  test  statistic.  They  started  with  the  fact  that  plots 
of  drifting  unit  root  processes  often  are  very  similar  to 
plots  of  processes  that  are  stationary  about  a broken  trend 
for  some  break  point.  Using  their  estimated  break  point 
asymptotic  distributions,  they  found  that  there  is  less 
evidence  against  the  unit  root  hypothesis  than  Perron  found 
for  many  of  the  data  series.  They  reversed  Perron's 
conclusions  for  five  of  the  eleven  N— P series  for  which  he 
rejected  the  unit  root  null  at  the  5%  level.  Also  they 
reversed  Perron's  unit  root  rejection  for  the  postwar 
quarterly  real  GNP  series  from  1947:1  to  1986:3  which  come 
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from  the  Citibase  databank.  However,  for  some  of  the  series 
(industrial  production,  nominal  GNP,  and  real  GNP) , they 
rejected  the  unit  root  hypothesis  even  after  endogenizing 
the  break  point  selection  procedure.  For  these  series,  their 
results  provided  stronger  evidence  against  the  unit  root 
hypothesis  than  that  given  by  Perron.  Note  that  Zivot  and 
Andrews'  (1990)  conclusion  about  the  annual  N-P  GNP  series 
is  different  from  the  Banerjee,  Dolado  and  Galbraith's 
(1990)  findings.  Recall  the  different  critical  values  of 
min-t  statistics  between  Banerjee,  Lumsdaine  and  Stock 
(1990)  and  Zivot  and  Andrews  mentioned  in  Section  2.2.2. 


CHAPTER  5 


RESULTS  FROM  THE  BAYESIAN  PERSPECTIVE 


5.1.  Introduction 


When  unit  roots  are  present  Bayesian  and  classical 

approaches  to  inference  diverge  substantially.  The 

asymptotic  distribution  theory  changes  discontinuously 

between  the  stationary  and  unit  root  case.  Confidence 

regions  based  on  asymptotic  theory  will  frequently  be 

disconnected  because  of  the  discontinuity  in  the  asymptotic 

theory.  In  Bayesian  perspectives  Sims  (1988,  p467)  argued: 

It  has  long  been  recognized  that  Bayesian 
inference  concerning  parameters  of  linear  time 
series  models,  conditional  on  the  initial  values 
of  the  observed  sample  and  Gaussian  disturbance 
distributions,  encounters  no  special  difficulties 
for  the  case  of  unit  roots.  The  likelihood,  and 
hence  the  posterior  p.d.f.  for  a flat  prior,  is 
Gaussian  in  shape  regardless  of  whether  or  not 
there  are  unit  (or  even  explosive)  roots.  This 
simple  flat-prior  Bayesian  theory  is  both  a more 
convenient  and  a logically  sounder  starting  place 
for  inference  than  classical  hypothesis  testing. 

We  will  discuss  the  debate  between  Bayesian  perspectives  and 

Classical  approach  to  the  unit  root  case  and  alternative 

priors  instead  of  flat  prior  in  Chapter  6. 

DeJong  and  Whiteman  (1989)  adopted  a Likelihood 

Principle  to  identify  the  type  of  prior  an  investigator 
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would  need  to  support  the  inference  of  integration  for  the 
series  considered  by  Nelson  and  Plosser  (1982).  The 
classical  Dickey-Fuller  test  conducted  in  Nelson  and  Plosser 
involved  comparing  an  estimate  of  AR  root,  p,  to  the 
likelihood  function,  _Z(p|p=l).  Values  of  p in  the  lower 
tail  constitute  evidence  against  the  difference-stationary 
(DS)  hypothesis.  Bayesian  posterior  analysis  involves 
consideration  of  _Z(p|p)  to  determine  which  values  of  p are 
most  likely  to  have  generated  the  observed  data  (i.e. 
generated  p).  If  there  is  substantial  posterior  probability 
associated  with  values  of  p near  unity,  the  DS  inference  is 
supported.  They  found  that  this  sort  of  inference  was  not 
supported  by  the  data  analyzed  by  Nelson  and  Plosser  and  the 
required  prior  was  excessively  sharp.  It  involves  assigning 
zero  probability  to  the  alternative  hypothesis  that  the 
series  are  trend-stationary.  DeJong  and  Whiteman  infer  from 
their  results  that  evidence  in  support  of  a stochastic  trend 
is  present  for  only  two  series  out  of  14  series. 

In  this  chapter  I will  follow  the  Likelihood  Principle 
approach  in  Bayesian  perspectives  and  attempt  to  detect  the 
structural  change  of  the  time  series  used  in  Nelson  and 
Plosser  (1982)  and  Banerjee,  Lumsdaine,  and  Stock  (1990) . 

The  simple  flat-prior  Bayesian  method  is  invariant  whether 
or  not  there  is  a unit  root  in  time  series.  For  identifying 
a structural  break  we  will  adopt  the  Monte  Carlo  results  in 
Section  3.2  that  if  there  is  a break,  then  a peak  in  the 


marginal  posterior  distribution  of  m appears  within  a 
sample  period. 
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5.2.  Empirical  Results 

I apply  the  flat-prior  Bayesian  analysis  of  the  Chapter 
3 to  the  Nelson-Plosser  (N-P)  and  Friedman-Schwartz  (F-S) 
real  per  capita  GNP  series  of  U.S.A.  and  quarterly  OECD  data 
described  in  Chapter  4. 

For  each  of  nine  series  we  obtain  the  marginal 
posterior  mass  function  of  m,  the  marginal  posterior 
distribution  of  p and  the  conditional  posterior 
distribution  functions  of  p given  a break  point  from  the 
augmented  Dickey-Fuller  regression  model  with  four  lags. 
Table  9 reports  the  posterior  probabilities.  Figures  in 
column  2 and  5 to  8 are  the  posterior  probabilities  of  the 
near  nonstationary  set,  P(ps:0.98)  . The  posterior 
probabilities  of  Column  2 and  7 to  8 are  calculated  from  the 
marginal  posterior  distribution  of  p and  Column  5 to  6 from 
the  conditional  posterior  distribution  functions  of  p given 
a break  point.  Columns  3 and  4 show  the  posterior 
probabilities  of  a break.  Column  3 shows  a peak  of  the 
posterior  mass  function  of  m,  when  it  is  found  within  the 
sample  period,  it  is  a break  point.  And  the  posterior 
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Table  9 

Posterior  Probabilities 


Series 

Marg. 

Dist. 

Prob.  of 
Break 

Cond. 

Dist. 

Marg. 

Dist. 

All 

Peak  Prob . 

1*  2** 

1 

2 

N-P 

.017 

beginning  - 

— — 

— 

— 

F-S 

. 003 

end 

- 

- 

- 

USA 

. 096 

1981:3  .169 

.055  .001 

. 128 

# 

001 

Canada 

.235 

1981:2  .391 

.017  .006 

. 028 

# 

102 

France 

. 675 

1969:4  .245 

.105  .008 

. 063 

# 

029 

Germany 

. 643 

beginning  - 

- 

- 

- 

Italy 

.909 

1973:2  .295 

.005  .003 

.082 

# 

069 

Japan 

.997 

1973:2  .155 

.336  .003 

.252 

. 

011 

UK 

.095 

end 

— — 

- 

- 

The  sample  period  before  the  break. 
The  sample  period  after  the  break. 
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probability  of  a break  is  shown  in  column  4.  When  it  is 
found  in  the  beginning  or  end  of  the  sample  period,  it  is 
not  regarded  as  a break  point. 

From  the  posterior  probabilities  in  Column  2 we  can 
infer  that  evidence  in  support  of  a stochastic  trend  is 
present  for  five  OECD  countries  but  not  for  two  countries  - 
USA  and  UK.  For  N-P  series  the  result  is  consistent  with 
DeJong  and  Whiteman  (1989) . Banerjee,  Dolado,  and  Galbraith 
(1990)  found  the  evidence  of  rejection  of  a unit  root  for  F- 
S series.  For  OECD  countries  except  USA  and  UK  our  results 
parallel  Banerjee,  Lumsdaine,  and  Stock's  (1990)  where  their 
standard  Dickey-Fuller  test  results  show  a strong  evidence 
of  a stochastic  trend  for  all  seven  countries. 

As  is  clear  from  the  inspection  of  Column  3 and  4,  USA 
and  Canada  seem  to  have  a structural  change  in  the  recession 
of  the  early  1980's  (second  oil  price  shock).  The  1973  oil 
price  shocks  seems  to  cause  a structural  change  in  Italy  and 
Japan.  For  France  the  1968  major  strike  might  be  a reason 
for  a break  in  the  output.  On  the  identification  of  a break 
our  results  for  Canada  and  Japan  are  similar  to  those  of 
Banerjee,  Lumsdaine  and  Stock.  They,  however,  detected  the 
several  different  break  points  for  one  country  according  to 
the  different  alternative  hypothesis. 

An  inspection  from  Column  5 to  8 shows  that  the 
posterior  probabilities  for  Canada,  France,  and  Italy 
indicate  some  evidence  supporting  Perron's  hypothesis.  The 
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outputs  of  those  countries  look  like  nonstationary  series, 
but  our  results  provide  evidence  for  trend-shift/stationary. 
For  Japan  before  the  1973  oil  price  shock  the  output  seems 
to  be  characterized  as  being  integrated,  but  after  the  shock 
it  is  better  characterized  as  trend-stationary.  The  real  per 
capita  GNP  for  USA  seems  to  be  a trend-stationary  series  and 
has  changed  the  trend  around  the  early  1980's  recession 
(second  oil  price  shock) . The  output  for  Germany  seems 
better  characterized  as  being  integrated  with  no  break 
during  the  sample  period.  The  results  for  UK  provides  no 
evidence  for  a stochastic  trend.  Only  for  Canada  are  our 
results  parallel  to  those  of  Banerjee,  Lumsdaine,  and  Stock. 
Finally  our  results  for  Japan  provide  an  answer  for  their 
ambiguous  conclusion. 


5.3.  Probability  of  a Break  and  Model  Specification 

In  this  section  I will  show  that  the  posterior 
distributions  of  an  unknown  break  point,  m,  appear  to 
differ  according  to  the  different  model  specifications.  In 
previous  sections  of  this  chapter  I used  the  augmented 
Dickey-Fuller  ( ADF)  regression  model  for  comparing  with  the 
results  of  classical  approach  as  follows: 

yt  = a + p t + p yt_x  + d(L)Ayt_1  + et 
where  four  lags  of  d(L)  were  included. 
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When  I specified  models  in  different  ways,  I obtained 
the  different  marginal  posterior  distributions  of  a break 
point.  Let  us  consider  the  simple  AR(1)  model  which  excludes 
the  trend  term  and  lags  of  Ayt_1  from  the  augmented  Dickey- 
Fuller  model  and  the  AR(1)  with  trend  when  trend  term  is 
added  to  AR ( 1) . 

Table  10  shows  the  break  point  detected  by  the  three 
different  models.  Columns  2,  3 and  4 indicate  the  results 
from  AR(1),  AR(1)  with  trend,  and  ADF(4),  respectively.  For 
USA  the  first  row  of  Table  1 shows  that  no  break  is  detected 
by  the  AR(1)  model,  but  when  the  trend  is  added  to  AR(1) 
model,  it  gives  some  evidence  of  a break  at  1981:3. 

Comparing  this  break  with  the  ADF ( 4 ) case  the  marginal 
posterior  distribution  of  m is  not  the  same  but  the  highest 
probability  occurs  at  the  same  period. 

Similarly  for  Canada  in  THE  AR(1)  case  there  seems  no 
break,  very  low  probability  less  than  0.1,  but  both  AR(1) 
with  trend  model  and  ADF (4)  model  indicate  a very  sharp  peak 
at  1981:2.  For  France  the  AR(1)  model  seems  not  to  provide 
strong  evidence  for  a break,  but  both  the  AR(1)  with  trend 
and  the  ADF (4)  models  provide  evidence  of  the  same  break  at 
1969:4.  The  marginal  posterior  distributions  of  m of  AR(l) 
and  ADF (4)  models  for  Germany  indicate  no  high  peak  except 
at  the  beginning  or  end  of  the  sample  period,  that  from  the 
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Table  10 

Posterior  Probabilities  of  a Break 


Series 

AR(  1) 

AR ( 1)  ' 

Irend 

ADF ( 4 ) 

Peak 

Prob. 

Peak 

Prob. 

Peak 

Prob. 

USA 

end 

— 

1981:3 

. 162 

1981:3 

. 169 

Canada 

1961:1 

.081 

1981:2 

.488 

1981:2 

. 391 

France 

1967:4 

. 100 

1969:4 

.424 

1969:4 

.245 

Germany 

end 

- 

1957:1 

. 136 

beginning  - 

Italy 

end 

- 

1974:2 

. 129 

1973:2 

.295 

Japan 

1959:1 

. 197 

1959:1 

.353 

1973:2 

. 155 

UK 

end 

1979:2 

.423 

end 
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AR(1)  with  trend  model  show  mild  evidence  for  a break  at 
1957:1.  For  Italy  the  AR(1)  model  implies  no  break,  but  the 
AR ( 1)  with  trend  model  indicates  a break  point  at  1974:2, 
though  not  strongly.  Including  the  lags  of  series  into  the 
model  provides  strong  evidence  for  a break  at  the  time  of 
the  1973  oil  price  shock.  The  AR(1)  model  and  the  AR(1)  with 
trend  model  for  Japan  provide  some  evidence  for  a break  at 
1959:1,  but  after  adding  the  lags  of  series  it  turns  out 
that  the  highest  probability  occurred  at  1973:2.  For  UK,  the 
AR ( 1 ) and  ADF (4)  model  show  no  break,  but  the  AR(1)  with 
trend  model  indicates  relatively  strong  evidence  for  a break 
at  1979:2. 

In  summary  we  found  the  following: 

1)  The  simple  AR(1)  model  indicates  no  evidence  for  a break 
for  all  countries  except  Japan. 

2)  The  AR ( 1)  with  trend  model  provide  some  evidence  for  a 
break  for  all  countries. 

3)  None  of  the  OECD  seven  countries  has  the  same  break  by 
the  three  different  models. 


CHAPTER  6 


ALTERNATIVE  PRIORS  FOR  THE  ANALYSIS 
OF  THE  AUTOREGRESSIVE  MODEL 


In  this  chapter  we  will  discuss  the  criticism  for  the 
classical  unit  root  inference  by  Sims  (1988)  and  the 
alternative  priors  for  Bayesian  methodology  by  Phillips 
(1990).  Sims'  basic  criticism  about  the  classical  inference 
for  unit  root  is  that  because  the  asymptotic  distribution 
theory  changes  discontinuously  between  the  stationary  and 
unit  root  cases,  classical  hypothesis  testing  based  on  the 
asymptotic  theory  cannot  deliver  reasonable  procedures  for 
inference  based  on  the  asymptotic  theory.  Phillips  argued 
that  when  the  ignorance  prior  is  used  instead  of  the  flat 
prior,  the  posterior  distribution  turns  out  to  have  a 
bimodal  shape,  thus  Bayes  confidence  sets  would  be  disjoint 
and  are  therefore  formally  analogous  to  those  that  are 
generated  by  classical  methods. 

Basically  Sims  argued  that  the  simple  flat-prior 
Bayesian  theory  is  both  a more  convenient  and  a logically 
sounder  starting  place  for  inference  than  classical 
hypothesis  testing.  Phillips,  however,  emphasizes  the  need 
to  develop  a new  asymptotic  theory  for  a unit  root  because 
his  alternative  ignorance  prior  provided  the  same  problem 
even  in  Bayesian  inferences. 
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Here  we  will  review  Phillips'  ignorance  prior  and 
provide  some  Monte  Carlo  results  for  comparing  the  two 
priors.  Phillips  argued  that  since  the  true  value  of  p 
influences  the  autocorrelation  structure  of  the  time  series, 
flat  priors  neglect  the  generic  information  that  is  the 
anticipated  amount  of  information  carried  by  the  data  about 
p . Strictly  speaking  this  is  a property  of  the  likelihood 
function  and  the  prior  should  have  nothing  to  do  with  this 
argument.  Phillips  suggested  what  he  calls  an  objective 
ignorance  prior  as  follows: 

Po  “ l^eel1/2 

where  I is  the  information  matrix.  This  is  the  prior 
derived  by  Jeffreys  (1961)  using  the  invariance  principle1. 
Consider  the  simple  AR(1) 

yt  = pyt. i + et  , et  ~ iid  N(0,o2) 

The  flat  prior  is 

P0  « 1 / o 

and  the  ignorance  prior  can  be  expressed  in  Phillips' 
version  as  follows: 

PQ  oc  (1/a)  Jpp2 


\ See  also  Zellner  (1971)  p 47. 
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where 


IzPiI+IZ£ 


•^pp 


l_p2  1.p2  ±_p2 

T(T-l) 


y£ 

o 


1-p 

1-p5 


2T 


P*1 


#)■ 


P=1 


To  obtain  the  posterior  distribution  we  multiply  the 
prior  with  the  likelihood  function.  But  the  likelihood 
function  already  has  all  the  information  about  data.  Thus 
the  posterior  distribution  from  the  ignorance  prior 
suggested  by  Phillips  might  have  a bias  because  of  an 
exaggerated  importance  given  to  p based  on  the  structure  of 
the  time  series. 


Phillips'  ignorance  prior  upweights  large  values  of  p 
based  on  the  anticipated  asymptotic  volume  of  confidence 
regions  which  will  be  tighter  when  |p|  £ 1.  Sims  (1988, 
p469)  argued: 

In  the  simplest  autoregressive  model  we  expect  the 
standard  error  of  estimate  of  the  OLS  estimator  p 
in  a sample  of  given  size  to  be  smaller  the  closer 
is  p to  1 (because  the  sum  of  squared  lagged  y's 
will  tend  to  be  larger  relative  to  o2  for  larger 
p's).  This  by  itself  makes  it  more  likely  that  a 
given  observed  p is  a spuriously  high  estimate 
generated  by  a smaller  p than  that  it  is  a 
spuriously  low  estimate  generated  by  a larger  p . 

But  this  does  not  skew  the  likelihood  toward  lower 
p's  because  the  distribution  of  p is  itself 
skewed  to  the  left  for  p's  near  1,  which  by 
itself  would  make  it  more  likely  that  a given 
observed  p is  spuriously  low  than  it  is 
spuriously  high.  The  classical  theory  focuses 
entirely  on  this  latter  effect,  paying  no 
attention  to  the  danger  that  we  can  be  misled  into 
giving  too  much  credence  to  large  p values 
because  of  the  more  erratic  behavior  of  estimates 
from  models  with  lower  p values. 
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It  seems  that  Phillips  suggested  his  ignorance  prior  to 
match  the  conclusions  obtained  from  the  Bayesian  approach  to 
those  derived  from  the  classical  approach  when  unit  roots 
are  present.  The  likelihood  is  not  skewed  toward  lower  p's, 
thus  the  only  source  of  skewness  is  the  prior  distribution. 
Phillips'  ignorance  prior  compensates  well  this  skewness  to 
the  left  for  p's. 

I did  a small  Monte  Carlo  experiment  for  the  case  T = 

50  from  20,000  replications  as  Phillips  (1990)  did.  The 
marginal  posterior  distribution  of  p varied  quite  q bit 
between  the  replications.  Figure  12  shows  the  typical  shapes 
of  those  distributions  when  p=l.  However,  sometimes  the 
marginal  posterior  distribution  from  the  ignorance  prior 
turns  out  to  be  bimodal  as  figure  13  shows.  One  of  Phillips' 
main  argument  using  the  ignorance  prior  is  about  the 
disjoint  confidence  sets  in  the  case  of  unit  root.  He  argued 
that  because  of  the  bimodality  of  the  marginal  posterior 
distribution  function  using  the  ignorance  prior,  Bayes 
confidence  sets  would  be  disjoint  and  are  therefore  formally 
analogous  to  those  that  are  generated  by  classical  methods. 

I did  the  same  Monte  Carlo  experiment  in  the  case  of  p=0.5. 
Figure  14  indicates  the  typical  types  of  those 
distributions.  Even  in  the  stationary  case  the  marginal 
posterior  distribution  from  the  ignorance  prior  turns  out  to 
be  bimodal  in  most  of  the  replications  out  of  20000. 
Sometimes  its  shape  is  as  in  figure  15.  Figures  14  and  15 
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Psriod 


Figure  12 

Marginal  Posterior  Distribution  of  rho 

rho=l 
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Period 


Figure  13 

Marginal  Posterior  Distribution  of  rho 

rho=l 
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Period 


Figure  14 

Marginal  Posterior  Distribution  of  rho 

rho=0 . 5 
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Figure  15 

Marginal  Posterior  Distribution  of  rho 

rho=0 . 5 
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imply  that  the  ignorance  prior  gives  too  much  weights  on  the 
nonstationary  case  even  when  the  true  value  of  p is  0.5. 

In  the  normal  linear  regression  model  the  flat  priors 
lead  to  Bayesian  confidence  sets  that  are  equivalent  to 
those  in  the  corresponding  sampling  theory.  Phillips  (1990, 
p8 ) used  ignorance  priors  in  place  of  flat  prior  and  then 
argued 

. . . Bayesian  posteriors  for  the  autoregressive 
coefficient  are  frequently  bimodal  and  lead  to  disjoint 
confidence  sets,  just  as  those  based  on  classical 
sampling  theory  asymptotics. 

As  we  saw  above,  even  in  the  case  of  stationarity , Phillips' 
ignorance  prior  produced  the  bimodal  posterior  distributions 
which  are  against  those  in  the  corresponding  sampling 
theory.  Ignorance  priors  do  not,  in  the  case  of 
stationarity,  represent  prior  information  in  any  meaningful 
sense . 

For  evidence  on  the  tightness  of  confidence  regions, 
Table  11,  12  and  13  show  the  results  of  Monte  Carlo 
experiments  when  p=l,  p=0.9,  and  p=0.5,  respectively.  The 
first  rows  in  Table  11,  12,  and  13  show  that  the  confidence 
regions  from  the  flat  prior  are  much  tighter  than  those  from 
the  ignorance  prior  even  in  the  case  of  a unit  root.  The 
second  rows  of  Table  11,  12,  and  13  indicate  the  downward 
bias  of  the  flat  prior  and  also  the  upward  bias  of  the 
ignorance  prior.  The  size  of  the  bias  from  the  ignorance 
prior,  however,  seems  to  be  bigger  than  that  of  the  flat 
prior.  From  an  inspection  of  the  third  row  of  Table  11  and 
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Table  11 


Simulation 

Results 

(Null:  Rho  = 

1.0) 

Flat  Prior 

Ignorance 

Prior 

Expect . 

Var . 

Expect. 

Var. 

P( .95<Rho<l. 05) 

0.6872 

0.0861 

0.6351 

0.0794 

P ( 1 . 0<Rho) 

0.4253 

0.0869 

0.6457 

0.0520 

Posterior  Mean 

0.9669 

0.0035 

1.0116 

0.0023 

| Mean  - Mode | 

0.0025 

0.0000 

0.0192 

0.0012 

Range 

Max. 

Min. 

Max. 

Min. 

P( .95<Rho<1.05) 

1.0000 

0.0005 

1.0000 

0.0008 

P(l. 0<Rho) 

1.0000 

0.0001 

1.0000 

0.0512 

Posterior  Mean 

1.0975 

0.5304 

1.5464 

0.5904 

| Mean  - Mode | 

0.0771 

0.0000 

0.6475 

0.0000 
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Table  12 

Simulation  Results  (Null:  Rho  = 0.95) 


Flat  Prior 

Ignorance  Prior 

Expect. 

Var. 

Expect. 

Var. 

P ( . 90<Rho<l . 0) 

0.5396 

0.0521 

0.3945 

0.0304 

P( .95<Rho) 

0.4379 

0.0808 

0.6661 

0.0489 

P(l. 0<Rho) 

0.1692 

0.0229 

0.4572 

0.0317 

Posterior  Mean 

0.9144 

0.0051 

0.9872 

0.0040 

| Mean  - Mode | 

0.0025 

0.0000 

0.0421 

0.0026 

Range 

Max. 

Min. 

Max. 

Min. 

P( .90<Rho<1.0) 

0.8508 

0.0001 

0.7103 

0.0002 

P( .95<Rho) 

1.0000 

0.0000 

1.0000 

0.0091 

P(l. 0<Rho) 

0.9567 

0.0000 

0.9970 

0.0090 

Posterior  Mean 

1.0625 

0.4239 

1.4797 

0.4392 

| Mean  - Mode | 

0.0800 

0.0000 

1.2714 

0.0000 
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Table  13 

Simulation  Results  (Null:  Rho  = 0.50) 


Flat  Prior 

Ignorance  Prior 

Expect. 

Var . 

Expect. 

Var. 

P( .45<Rho<0.55) 

0.2457 

0.0087 

0.1997 

0.0067 

P ( . 50<Rho) 

0.4875 

0.0819 

0.5777 

0.0752 

P ( 1 . 0<Rho) 

0.0004 

0.0000 

0.1617 

0.0157 

Posterior  Mean 

0.4818 

0.0145 

0.7135 

0.0525 

|Mean  - Model 

0.0041 

0.0001 

0.2157 

0.0251 

Range 

Max. 

Min. 

Max. 

Min. 

P( . 45<Rho<0 .55) 

0.3974 

0.0001 

0.3922 

0.0000 

P( .50<Rho) 

1.0000 

0.0003 

1.0000 

0.0016 

P(l. 0<Rho) 

0.1000 

0.0000 

0.9924 

0.0003 

Posterior  Mean 

0.8627 

0.0893 

2.5054 

0.0931 

| Mean  - Mode | 

0.1132 

0.0000 

1.5061 

0.0000 
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the  fourth  rows  of  Table  12  and  13,  the  posterior  means  from 
ignorance  priors  seem  to  be  less  biased  that  those  from  flat 
priors.  However,  as  we  discussed  above,  comparing  the 
posterior  means  is  meaningless  because  of  the  bimodality  of 
the  marginal  posterior  distribution  of  the  autoregressive 
coefficient  from  the  ignorance  priors.  The  last  rows  of 
Table  11,  12,  and  13  indicate  the  difference  between  the 
posterior  means  and  the  posterior  modes.  These  figures  imply 
that  the  bimodality  from  the  ignorance  priors  occurs  very 
f reguently  as  p deviates  from  1. 

Sims  (1988,  p467)  suggested  a better  classical 

approach: 

. . . Using  Monte  Carlo  small  sample  distribution 
theory  ...One  can  generate  the  joint  distribution 
of  test  statistics  of  interest  for  a number  of 
parameter  points  near  the  likelihood— maximizing 
one  and  compare  the  likelihood  of  the  observed 
sample  under  the  various  estimated  joint 
distributions.  ...  doing  it  systematically 
eventually  leads  back  to  a Bayesian  framework. 

Sims  also  argues  that  the  analytical  difficulties  of  the 

classical  inferential  procedures  even  in  simple  cases 

prevent  our  making  progress  on  the  real  issues  - for 

example,  nonnormality  of  disturbances  and  proper  accounting 

for  the  evidence  about  parameters  contained  in  initial 


conditions . 


CHAPTER  7 


CONCLUSION 


The  two  crucial  parameters  in  the  models  considered  are 
m,  the  break-point  and  p,  the  autoregressive  parameter. 

Monte  Carlo  studies  about  detecting  a structural  change  in 
the  autoregressive  model  show  that  the  Bayesian  posterior 
mass  function  of  m detects  a break  point  more  readily  than 
the  classical  approach  even  when  the  series  is 
nonstationary.  When  a peak  of  the  marginal  posterior  mass 
function  of  a occurs  within  a sample  period,  it  indicates  a 
break  point.  Bayesian  methodology  and  the  results  of  Monte 
Carlo  experiments  are  applied  to  the  data  sets  analysed  by 
Banerjee,  Dolado,  and  Galbraith  (1990)  and  Banerjee, 

Lumsdaine,  and  Stock  (1990). 

An  inspection  of  the  marginal  posterior  distribution  of  p 
by  using  full  samples  shows  evidence  for  a stochastic  trend 
for  five  OECD  countries,  but  not  for  the  Nelson-Plosser , and 
Friedman-Schwartz  annual  series  for  the  US,  and  the 
quarterly  series  for  US  and  UK.  The  results  for  N-P  and  F-S 
series  is  consistent  with  those  obtained  by  DeJong  and 
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Whiteman  (1989)  and  different  from  those  obtained  by  the 
classical  approach.  For  the  five  OECD  countries  — Canada, 
France,  Germany,  Italy,  and  Japan  — except  Germany,  using 
the  Bayesian  inference  of  the  marginal  posterior  mass 
function  of  m,  I found  strong  evidence  supporting  Perron's 
hypothesis  even  after  endogenizing  the  break  point  selection 
procedure.  The  results  for  four  of  the  seven  OECD 
countries  — Canada,  France,  Italy,  and  Japan  — show  that 
standard  tests  of  the  unit  root  hypothesis  were  biased  in 
favor  of  accepting  the  unit  root  hypothesis  if  the  series 
had  a structural  break  at  some  intermediate  date.  The 
results  for  Japan  provide  some  evidence  that  the  time  series 
on  output  has  changed  from  nonstationary  to  stationary 
around  the  1973  oil  price  shock.  This  may  explain  Banerjee, 
Lumsdaine,  and  Stock's  ambiguous  conclusions  among  several 
alternatives  for  Japan. 

Recently  Phillips  (1990)  criticized  the  simple  flat- 
prior  Bayesian  approach  and  suggested  an  alternative 
ignorance  prior  based  on  Jeffreys'  principle  of  invariance. 
The  Ignorance-prior  Bayesian  approach  suggested  by  Phillips, 
which  results  in  the  same  inference  as  that  of  the  classical 
approach,  uses  the  sample  information  by  giving  more  weight 
to  large  values  of  p . Phillips  argued  that  the  ignorance- 
prior  tightens  the  confidence  regions  near  the  true  value. 
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My  results  of  a small  Monte  Carlo  experiment  provides 
evidence  against  Phillips'  argument.  My  empirical 
applications  of  the  flat-prior  Bayesian  approach  for  a 
structural  break  result  in  divergent  conclusions  from  those 
of  the  classical  approach. 
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